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Antigens and Their Detection 

TECBKICAL FIELD 

The invention relates to novel nucleotide sequences 
5 located in a gene which encodes a bacterial flagellin 

antigen, and the use of those nucleotide sequences for the 
detection of bacteria which express particular flagellin 
antigens, on the basis of that antigen alone, or in 
conjunction with the O antigen expressed by that strain. 

10 

BACKGROUND ART 

The f lagelluiti of many bacteria appears to be made up of a 
single protein known as flagellin. The serotyping schemes 
of E. coli and Salmonella enterica are based on highly 

15 variable antigenic surface structures which include the 
lipopolysaccharide which carries the 0 antigen and 
flagellin which is now known to be the carrier of the 
classical H antigen. In many strains of 5. enterica there 
are two loci {flic and fljB) which encode flagellin, and a 

20 regulatory system which allows one only to be expressed at 
any time; and which also provides for expression to rapidly 
alternate between the two forms first identified as two 
phases (HI and H2) for the H antigen of most strains. In E, 
coli there are 54 forms of H antigen recognised and irntil 

25 recently they were all thought to be encoded at the fliC 
locus, as has been shown for K. coli K-12 . However in the 
1980s Ratiner [Ratiner Y A *Phase variation of the H 
antigen in Escherichia coli strain Bi327-41, the standard 
strain for Escherichia coli flagellin antigen H3' FEMS 

30 Microbiol. Lett 15 (1982) 33-36; Ratiner Y A ^Presence of 
two structural genes determining antigenically different 
phase-specific flagellins in some Escherichia coli strains' 
FEMS Microbiol. Lett. 19 (1983) 37-41; Ratiner Y A 'Two 
genetic cirrangements determining flagellin antigen 
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specificities in two diphasic Escherichia coli strains" 
FEMS Microbiol. Lett, 29 (1985) 317-323; Ratiner Y A 
**Different alleles of the flagellin gene hagB in 
Escherichia coli stcindard H test strains" FEMS Microbiol 
5 Lett. 48 (1987) 97-104.] showed that in some cases there 

are two loci and that expression can alternate. The matter 
was further complicated by a recent paper by Ratiner 
[Ratiner Y A (1998) "New f lagellin-specifying genes in some 
Escherichia coli strains" J. Bacterid. 180 979-984] 

10 showing three loci {flk, fll and flm) for flagellin in 

addition to fliC although the fljB locus has not been found 
in E. coli. However E. coli strains are normally 
identified by the combination of one 0 antigen and one H 
antigen [and K antigen when present as a capsule (K) 

15 antigen] , with no problems reported for the vast majority 
of cases with alternate phases, while S, enterica strains 
are normally identified by the combination of 0, HI and H2 
antigens. It is still not clear how widespread in E. coli H 
antigens determined by flagellin genes other than flic are. 

20 Typing is typically carried out using specific 

antisera. The incidence of pathogenic E. coli in 
association with hximan and animal disease supports the need 
for suitable and rapid typing tecliniques. 

25 DESCRIPTION OF THE INVENTION 

In a first aspect, the present invention provides a 
novel nucleic acid molecule encoding all or part of an E. 
coli flagellin protein. 

The present invention provides, for the first time, 
30 full length sequence for a flagellin gene for the 

following E, coli serotypes: H6, H9, HIO, H14, H18, H23, 
H51, H45, H49, H19, H30, H32, H26, H41, H15, H16, H20, 
H28, H46, H31, H34, H43 and H52. Corrected full length 
sequences have been obtained for H7 and H12, 
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Partial flagellin gene sequence, including the central 
variable region, has been obtained for the following E. 
coll H serotypes: H46, H8, H21, H47, Hll, H17, H25, H42, 
H27, H35, H2, H3, H24, H37, H50, H4, H44, H38, H39, H55, 
5 H29, H33, H5, H54 and H56. Comparison of sequences 

demonstrates that unique flagellin genes have now been 
sequenced (partially or completely) for the following E. 
coli H serotypes: H2, H3, H5, H6, H7, H9, Hll, H14, H18, 
H19, H20, H21, H23, H24, H25, H26, H27, H28, H29, H30, 

10 H31, H32, H33, H34, H35, H37, H38, H39, H41, H42, H43, 

H45, H46, H48, H49, H51, H52, H54, and H56 and either H8 
or H40, H15 or H16, Hi or H12, HIO or H50 and H4 or H17 . 

By comparison of these sequences, the present 
inventors were able to identify specific sequences for 

15 each of the above H serotypes. 

The present invention also provides fliC sequences 
from 10 different H7 strains, in addition to that from the 
H7 typing strain, and two sequences specific to H7 of 0157 
and 055 E. coli strains. 

20 The present invention encoir^asses all or part of the 

unique genes sequenced for H2, H3, H5, H6, H9, Hll, H14, 
H18, H19, H20, H21, H23, H24, H25, H26, H27, H28, H29, 
H30, H31, H32, H33, H34, H35, H37, H38, H39, H41, H42, 
H43, H45, H46, H48, H49, H51, H52, H54 and H56 and either 

25 H8 or H40, H15 or H16, HIO or H50 and H4 or H17. The 

invention also encompasses newly provided sequence for H7 
and H12 as well as novel primers for the specific 
amplification of HI, H7, H12 and H48 as well as for the 
other above mentioned newly sequenced flagellin genes. 

30 The nucleic acid molecules of the invention may be 

variable in length. In one embodiment they are 
oligonucleotides of from about 10 to about 20 nucleotides 
in length. The oligonucleotides of the invention are 
specific for the flagellin gene from which they are 

35 derived and are derived from the central region of the 
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gene. In one embodiment, oligonucleotides in accordance 
with the present invention, which also include 
oligonucleotides from the previously sequenced E, coli HI, 
H7, H12 and H48 genes, are those shown in Table 3. 
5 The 44 sequences (see Table 3) provide a panel to 

which newly sequenced genes can be compared to select 
specific oligonucleotides for those newly sequenced genes. 

In a second aspect the invention provides a method of 
detecting the presence of E. coli of a particular H 

10 serotype in a sample, the method comprising the step of 
specifically hybridising at least one nucleic acid 
molecule derived from a flagellin gene, wherein the at 
least one nucleic acid molecule is specific for a 
particular flagellin gene associated with the H serotype, 

15 to any E. coli in the sample which contain the gene, and 
detecting any specifically hybridised nucleic acid 
molecules, wherein the presence of specifically hybridised 
nucleic acid molecules identifies the presence of the H 
serotype in the sample. 

20 In one preferred embodiment the detection method is a 

Southern blot method. More preferably, the nucleic acid 
molecule is labelled and hybridisation of the nucleic acid 
molecule is detected by autoradiography or detection of 
fluorescence. 

25 Preferred nucleic acid molecules for the detection of 

particular flagellin genes are listed in Table 3 . 

In a third aspect the invention provides a method of 
detecting the presence of E. coli of a particular H 
serotype in a sample, the method comprising the step of 

30 specifically hybridising at least one pair of nucleic acid 
molecules to any E. coli in the sample which contains the 
flagellin gene for the particular H serotype, wherein at 
least one of the nucleic acid molecules is specific for 
the particular flagellin gene associated with the H 

35 serotype, and detecting any specifically hybridised 
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nucleic acid molecules, wherein the presence of 
specifically hybridised nucleic acid molecules identifies 
the presence of the H serotype in the sample. 

In one preferred embodiment the detection method is a 
5 polymerase chain reaction method. More preferably, the 
nucleic acid molecules are labelled and hybridisation of 
the nucleic acid molecule is detected by electrophoresis. 

It is recognised that there may be instances where 
spurious hybridisation will arise through the initial 
10 selection of a sequence found in many different genes but 
this is typically recognisable by, for instance, 
comparison of band sizes against controls in PGR gels, and 
an alternative sequence can be selected. 

In a fourth aspect the invention provides a method 
15 for detecting the presence of a particular O serotype and 
H serotype of E. coli in a sample, the method comprising 
the following steps: 

(a) specifically hybridising at least one nucleic 
acid molecule, derived from and specific for a gene 
20 encoding a transferase or a gene encoding an enzyme for 
the transport or processing of a polysaccharide or 
oligosaccharide unit, the gene being involved in the 
synthesis of a particular E. coli O antigen, to ciny K. 
coli in the sample which contain the gene; 
25 (b) specifically hybridising at least one nucleic 

acid molecule derived from and specific for a particular 
flagellin gene associated with that H serotype, to any E. 
coli in the sample which contain the gene; and 

(c) detecting cuiy specifically hybridised nucleic 
30 acid molecules. 

Preferred nucleic acid molecules for the detection of 
particular flagellin genes are listed in Table 3, 

In one preferred embodiment, the sequence of the 
nucleic acid molecule specific for the O antigen is 
35 specific to the nucleotide sequence encoding the 0111 
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antigen. More preferably, the sequence is derived from a 
gene selected from the group consisting of wbdH 
(nucleotide position 739 to 1932 of Figure 5) , wzx 
(nucleotide position 8646 to 9911 of Figure 5), wzy 
5 (nucleotide position 9901 to 10953 of Figure 5), wbdM 

(nucleotide position 11821 to 12945 of Figure 5) and 
fragments of those molecules of at least 10-12 nucleotides 
in length. Particularly preferred nucleic acid molecules 
are those set out in Tables 8 and 8A, with respect to the 

10 above mentioned genes. 

In another preferred embodiment, the sequence of the 
nucleic acid molecule specific for the O antigen is 
specific to the nucleotide sequence encoding the 0157 
antigen- More preferably, the sequence is derived from a 

15 gene selected from the group consisting of wbdN 

(nucleotide position 79 to 861 of Figure 6) , wbdO 
(nucleotide position 2011 to 2757 of Figure 6) , wbdP 
(nucleotide position 5257 to 6471 of Figure 6) , wbdR 
(nucleotide position 13156 to 13821 of Figure 6) , wzx 

20 (nucleotide position 2744 to 4135 of Figure 6) and wzy 

(nucleotide position 858 to 2042 of Figure 6) and 
fragments of those molecules of at least 10-12 nucleotides 
in length. Particularly preferred nucleic acid molecules 
are those set out in Tables 9 and 9A, with respect to the 

25 above mentioned genes. 

In one preferred embodiment the detection method is a 
Southern blot method- More preferably, the nucleic acid 
molecule is leO^elled and hybridisation of the nucleic acid 
molecule is detected by autoradiography or detection of 

30 fluorescence. 

In a fifth aspect the invention provides a method for 
detecting the presence of a particular 0 serotype and H 
serotype of E. coli in a sample, the method comprising the 
following steps: 

35 (a) specifically hybridising at least one pair of 
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nucleic acid molecules, at least one of which is derived 
from and specific for a gene encoding a transferase or a 
gene encoding an enzyme for the transport or processing' of 
a polysaccharide or oligosaccharide unit, the gene being 
5 involved in the synthesis of the particular E. coli 0 

antigen, to any E. coli in the sample which contain the 
gene; 

(b) specifically hybridising at least one pair of 
nucleic acid molecules, at least one of which is derived 

10 from and specific for a particular f lagellin gene 

associated with the particular H serotype, to any E, coli 
in the sample which contain the gene; and 

(c) detecting any specifically hybridised nucleic 
acid molecules - 

15 Preferred nucleic acid molecules for the detection of 

particular f lagellin genes are listed in Tcible 3 . 

In one preferred embodiment, the sequence of the 
nucleic acid molecule specific for the O antigen is 
specific to the nucleotide sequence encoding the 0111 

20 antigen. More prefereibly, the sequence is derived from a 
gene selected from the group consisting of wbdH 
(nucleotide position 739 to 1932 of Figure 5) , wzx 
(nucleotide position 8646 to 9911 of Figure 5), wzy 
(nucleotide position 9901 to 10953 of Figure 5), wbdM 

25 (nucleotide position 11821 to 12945 of Figure 5) and 

fragments of those molecules of at least 10-12 nucleotides 
in length. Particuleirly preferred nucleic acid molecules 
are those set out in Tables Sand 8A, with respect to the 
above mentioned genes. 

30 In another preferred embodiment, the sequence of the 

nucleic acid molecule specific for the O antigen is 
specific to the nucleotide sequence encoding the 0157 
antigen. More preferably, the sequence is derived from a 
gene selected from the group consisting of wbdP/(nucleotide 

35 position 79 to 861 of Figure 6), wbdO (nucleotide position 
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2011 to 2757 of Figure 6), whdP (nucleotide position 5257 
to 6471 of Figure 6), wbdR (nucleotide position 13156 to 
13821 of Figure 6) , wzx (nucleotide position 2744 to 4135 
of Figure 6) and wzy (nucleotide position 858 to 2042 of 
5 Figure 6) and fragments of those molecules of at least 10- 
12 nucleotides in length. Particularly preferred nucleic 
acid molecules are those set out in Tables 9 and 9A, with 
respect to the above mentioned genes. 

In one preferred embodiment the detection method is a 

10 polymerase chain reaction method. More preferably, the 
nucleic acid molecules are labelled and hybridisation of 
the nucleic acid molecule is detected by electrophoresis. 

The present inventors believe that based on 
the teachings of the present invention and available 

15 information concerning O antigen gene clusters, and 
through- use of experimental analysis, comparison of 
nucleic acid sequences or predicted protein structures, 
nucleic acid molecules in accordance with the invention 
can be readily derived for any particular 0 antigen of 

20 interest. Suitable bacterial strains can typically be 
acquired commercially from depositary institutions. 

There are currently 166 defined K. coli 0 antigens. 
Samples of the 166 different E. coli O antigen 
serotypes are available from Statens Serum Institut, 

25 Copenhagen, Denmark. 

The inventors envisage rare circumstances whereby two 
genetically similar gene clusters encoding serologically 
different O antigens have arisen through recombination of 
genes or mutation so as to generate polymorphic variants, 

30 In these circtimstances multiple pairs of oligonucleotides 
may be selected to provide hybridisation to the specific 
combination of genes. The invention thus envisages the 
use of a panel containing multiple nucleic acid molecules 
for use in the method of testing for O antigen in 

35 conjunction with H antigen, wherein the nucleic acid 
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molecules are derived from genes encoding transferases 
and/or enzymes for the transport or processing of a 
polysaccharide or oligosaccharide vinit including vzx or 
vfzy genes, wherein the panel of nucleic acid molecules is 
5 specific to a particular 0 antigen. The panel of nucleic 
acid molecules can include nucleic acid molecules derived 
from 0 antigen sugar pathway genes where necessary. 

The inventors also found two mutated f lagellin genes 
from H typing strains for H35 and H54 which have insertion 

10 sequences inserted into normal flagellar genes identical 
or near identical to that that of the Hll and H21 typing 
strains respectively. Thus, primers for Hll and H21 
(listed in Table 3) would also amplify fragments in H35 
and H54, which differ in sizes to those in Hll and H21 

15 respectively. The inventors also provide two pairs of 
primers each for H35 and H54 based on the insertion 
sequence (see H35 and H54 col\imns in Table 3) . The use of 
one of them in combination with one of the Hll or H21 
primers will generate a PGR band only in H35 or H54 

20 respectively, and this will also differentiate H35 and H54 
from Hll and H21 respectively. 

The present invention also relates to methods of 
detecting the presence of particular E. coli H antigens or 
H antigen and O antigen combinations where one or more 

25 nucleic acid molecules which generate a particular size 

fragment indicative of the presence of that H antigen are 
used or in which the combination of one antigen specific 
primer for that H antigen with another primer for a 
related H antigen provides for the detection of the 

30 particular H antigen by hybridisation to the relevant 

gene. Preferably, the H antigen is Hll, H21, H35 or H54. 

The pairs of nucleic acid molecules where the method 
of the fifth aspect is used may both hybridise to the 
relevant H or O antigen gene or alternatively only one may 

35 hybridise to the relevant gene and the other to another 
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site. 

The inventors recognise in applying the methods of 
the invention for detecting combinations of 0 and H 
antigens to samples, that the methods do not indicate 
5 whether a positive result for a particular 0 and H antigen 
combination arises because the 0 and H antigen are 
present on a single E. coli strain present in the sample 
or are present on different E, coli strains present in the 
sample. Because the ability to identify the presence of 

10 E. coli strains with particular 0 and H antigen 

combinations is highly desirable (due to the relationship 
between particular combinations and pathogenicity) the 
determination that a particular combination is present in 
a sample can be followed by isolation of single colonies 

15 and checking whether the they contain the relevant 
combination by using the same method again or using 
antibody labelled magnetic beads to separate cells 
expressing the particular O or H antigen and then testing 
the isolated cells for the other serotype. 

20 In addition, as mentioned above, the present 

inventors have established the existence of H7 primers 
specific to the 0157 and 055 serotypes. Using such 
primers it is possible to detect particular O and H 
antigen combinations with the use of H specific nucleic 

25 acid molecules. 

In a sixth aspect the invention provides a method for 
detecting the presence of a particular O serotype and H 
serotype of E, coli in a sample, the method comprising the 
following steps: 

30 (a) specifically hybridising at least one nucleic 

acid molecule, derived from and specific for a gene 
encoding a flagellin associated with a particular E. coli 
H antigen serotype to any E, coli carrying the gene and 
present in the sample; 
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and 

(b) detecting the at least one specifically 
hybridised nucleic acid molecule, wherein the at least one 
nucleic acid molecule is specific for the particular 
5 combination of O and H antigen. 

Preferably the combination is 055 :H7 or 0157 :H7. 
The ability to detect the 0157 :H7 combination from a 
particular H7 primer or pair is of particular use given 
the association of this combination with pathogenic 
10 strains - 

In a seventh aspect the present, invention provides a 
method for testing a food derived sanple for the presence 
of one or more particular E. coli O antigens and H 
antigens comprising testing the sample by a method of the 
15 fourth, fifth or sixth aspect the invention. 

In an eighth aspect the present invention provides a 
method for testing a faecal derived sample for the 
presence of one or more particular E. coli O antigens and 
H antigens comprising testing the sample by a method of 
20 the fourth, fifth or sixth aspect the invention. 

In a ninth aspect the present invention provides a 
method for testing a patient or cinimal derived sample for 
the presence of one or more particular E. coli O cintigens 
and H antigens comprising testing the sample by a method 
25 of the fourth, fifth or sixth aspect the invention. 

Preferably, the method of the seventh, eighth or 
ninth aspect of the invention is a polymerase chain 
reaction method. More preferably the oligonucleotide 
molecules for use in the method are Icibelled. Even more 
30 preferably the hybridised nucleic acid molecules are 
detected by electrophoresis. 

In the above described methods it will be understood 
that where pairs of nucleic acid molecules are used one of 
the nucleic acid molecules may hybridise to a sequence 
35 that is not from the O antigen trsuisf erase, wzx or wzy 
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gene or the flagellin gene. Further where both hybridise 
to these genes the 0 antigen molecules may hybridise to 
the same or a different one of these genes. 
In a tenth aspect the present invention provides a kit 
5 for identifying the H serotype of E. coli, the kit 
comprising: 

at least one nucleic acid molecule derived from and 
specific for an E. coli flagellin gene. 

10 In an eleventh aspect the present invention provides 

a kit for identifying the H and O serotype of E. coli, the 
kit comprising: 

(a) at least one nucleic acid molecule derived from and 
specific for an E. coli flagellin gene; and 

15 (b) at least one nucleic acid molecule derived from 

and specific for a gene encoding a transferase or a gene 
encoding an enzyme for the transport or processing of a 
polysaccharide or oligosaccharide vinit, the gene being 
involved in the synthesis of a particular E. coli O 

20 antigen. 

The nucleic acid molecules may be provided in the 
same or different vials. The kit may also provide in the 
same or separate vials a second set of specific nucleic 
acid molecules. 

25 Particularly preferred nucleic acid molecules for 

inclusion in the kits are those specified in Tables 3, 8, 
8A, 9 and 9A as described above, 

DEFINITIONS 

30 In this specification, we have used term *flagellin gene" 
in many cases where previously one would have used "^fliC, 
to allow for the xincertainty as to locus introduced by 
recent observations. However, vincertainty as to the locus 
does not alter the fact that most E. coli strains express a 

35 single H antigen and that a single flagellin gene sequence 

C:\WIND0WS\DESKTOP\My Brief case\212doc\H antigen speci coli only. doc 



14 



per strain is required to give the genetic basis for H 
antigen variation . Any use of the name fliC in this 
specification where a different locus is later shown to be 
involved would not affect the validity of conclusions drawn 
5 regarding application of information based on the sequence, 
where the conclusions do not relate to the map position. 
Thus it is generally the nucleic acid molecule itself which 
is of importance rather than the name attributed to the 
gene. When it is known or suspected that the gene encoding 

10 the H antigen is not in the fliC locus, we use the term 
flagellin rather than fliC. 

The phrase, '^a nucleic acid molecule derived from a 
gene" means that the nucleic acid molecule has a 
nucleotide sequence which is either identical or 

15 substantially similar to all or part of the identified 
gene. Thus a nucleic acid molecule derived from a gene 
can be a molecule which is isolated from the identified 
. gene by physical separation from that gene, or a molecule 
which is artificially synthesised and has a nucleotide 

20 sequence which is either identical to or substantially 

similar to all or part of the identified gene. While some 
workers consider only the DNA strand with the same 
sequence as the mRNA transcribed from the gene, here 
either strand is intended. 

25 Transferase genes are regions of nucleic acid which 

have a nucleotide sequence which encodes gene products 
that transfer monomeric sugar iinits. 

Flippase or wzx genes are regions of nucleic acid 
which have a nucleotide sequence which encodes a gene 

30 product that flips oligosaccharide repeat units generally 
composed of three to six monomeric sugar units to the 
external surface of the membrane. 

Polymerase or wzy genes are regions of nucleic acid 
which have a nucleotide sequence which encodes gene 

35 products that polymerise repeating oligosaccharide xinits 
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generally composed of 3-6 monomeric sugar units. 

The nucleotide sequences provided in this 
specification are described as anti-sense sequences. This 
term is used in the same manner as it is used in Glossary 
5 of Biochemistry and Molecular Biology Revised Edition, 

David M. Glick, 1997 Portland Press Ltd., London on page 
11 where the term is described as referring to one of the 
two strands of double- stranded DNA usually that which has 
the same sequence as the mRNA. We use it to describe this 
10 strand which has the same sequence as the mRNA. 



NOMENCIiATURE 



Synonyms for coli 0111 rfb 





Current names 


Our names Bastin et al. 1991 


15 


wbdH 


orfl 




gmd 


orf2 




wbdl 


orf3 orf3.4* 




manC 


orf4 rfbM* 




manB 


orfS rfbK* 


20 


wbdJ 


orf6 orf6,7* 




wbdK 


orf7 orf7.7* 




wzx 


orfS orf8.9 and rfbX* 




wzy 


orf9 




wbdL 


orflO 


25 


wbdM 


orfll 




* Nomenclature according to Bastin D.A., et al. 1991 'Molecular 




clonina and expression in Escherichia coli K-12 of the rfb gene 




cluster determinina the 0 antigen of an E. coli 0111 strain". Mol, 




Microbiol. 5:9 2223-2231. 


30 








Other Synonyms 






wzy 


rfc 




wzx 


rfbX 




rmlA 


rfbA 


35 


rmlB 


rfbB 




rmlC 


rfbC 




rmlD 


rfbD 




glf 


orf6* 




wbbi 


orf3#, orf8* of E. coli K-12 


40 


wbbJ 


orf2#, orf9* of E. coli K-12 




wbbK 


orfl#, orflO* of E. coli K-12 




wbbL 


orf5#, orf 11* of E. coli K-12 



# Nomenclature according to Yao, Z. And M. A. Valvano 1994. 
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"Genetic analysis of the 0-specific lipopolysaccharide biosynthesis 
region (rfb) of Eschericia coli K-12 W3110: identification of genes 
the confer groups - spec ificty to Shigella flexineri serotypes Y and 
4a''. J. Bacteriol. 176: 4133-4143. 
5 ♦ Nomenclature according to Stevenson et al. 1994. "Structure of 

the 0-antigen of E. coli K-12 and the sequence of its rfb gene 
cluster'. J. Bacteriol 176: 4144-4156. 

• The O antigen genes of many species were given rfb names ( rfbA etc) 
and the 0 emtigen gene cluster was often referred to as the rfb 
10 cluster. There are now new ncimes for the rfb genes as shown in the 

table. Both terminologies have been used herein, depending on the 
source of the information. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 shows Eco Rl restriction maps of cosmid 
clones PPR1054, pPR1055, pPR1056, pPR1058, pPR1287 which 
are subclones of E. coli 0111 O antigen gene cluster. The 
5 thickened line is the region common to all clones. Broken 
lines show segments that are non-contiguous on the 
chromosome. The deduced restriction map for E. coli 
strain M92 is shown above. 

Figure 2 shows a restriction mapping analysis of E. 
10 coli 0111 O antigen gene cluster within the cosmid clone 
pPRl058. Restriction enzymes are: (B: BamHl; Bg: Bglll, 
E: EcoRl; H: Hindlll; K: iCpnl; P: PstI; S: Sail and X: 
Xhol. Plasmids pPR1230, pPR1231, and pPR1288 are deletion 
derivatives of pPR1058. Plasmids pPR 1237, pPR1238, 
15 pPRl239 and pPR1240 are in pUC19. Plasmids pPRl243, 

pPR1244, PPR1245, pPRl246 and pPRl248 are in pUClB, and 
PPR1292 is in pUC19. Plasmid pPR1270 is in pT7T319U. 
Probes 1, 2 and 3 were isolated as internal fragments of 
PPR1246, PPR1243 and pPR1237 respectively. Dotted lines 
20 indicate that subclone DNA extends to the left of the map 
into attached vector. 

Figure 3 shows the structure of E. coli 0111 O 
antigen gene cluster. 

Figure 4 shows the structure of E. coli 0157 0 
25 antigen gene cluster. 

Figure 5 shows the nucleotide sequence of the E, coli 
0111 0 cintigen gene cluster. Note: (1) The first and last 
three bases of a gene are imderlined and of italic 
respectively.; (2) The region which was previously 
30 sequenced by Bastin and Reeves 1995 ^Sequence and anlysis 
of the O cintigen gene (rfb) cluster of Escherichia coli 
0111- Gene 164: 17-23 is marked. 

Figure 6 shows the nucleotide sequence of the E. coli 
0157 0 antigen gene cluster. Note: (1) The first and last 



C;\WIND0WS\DESKTOP\My Brief case\2l2doc\H antigen speci coli only. doc 



18 

three bases of a gene (region) are underlined and of italic 
respectively (2) The region previously sequenced by Bilge 
et al. 1996 "Role of the Escherichia coli 0157 -H7 0 side 
chain in adherence and analysis of an rfb locus". Inf. and 
5 Immun 64:4795-4801 is marked. 

Figures 7 to 18 show the nucleotide sequences obtained 
for flagellin genes from E. coli typing strains for H1-H12 
respectively. The primer positions listed in Table 3 are 
based on treating the first nucleotide of each of these 

10 sequences as No. 1. 

Figures 19 to 26 show the nucleotide sequences 
obtained for flagellin genes from E. coli typing strains 
for H14-H21 respectively. The primer positions listed in 
Table 3 are based on treating the first nucleotide of each 

15 of these sequences as No. 1. 

Figures 27 to 39 show the nucleotide sequences 
obtained for flagellin genes from E. coli typing strains 
for H23-H35 respectively. The primer positions listed in 
Table 3 are based on treating the first nucleotide of each 

20 of these sequences as No. 1. 

Figures 40 to 55 show the nucleotide sequences 
obtained for flagellin genes from E. coli typing strains 
for H37-H52 respectively. The primer positions listed in 
Table 3 are based on treating the first nucleotide of each 

25 of these sequences as No. 1. 

Figures 56 to 58 show the nucleotide sequences 
obtained for flagellin genes from E. coli typing strains 
for H54-H56 respectively. The primer positions listed in 
Table 3 are based on treating the first nucleotide of each 

30 of these sequences as No. 1. 

Figures 59 to 68 show the nucleotide sequences 
obtained for flagellin genes from K. coli H7 strains M1179, 
M1004, M1211, M1200, M1686, M1328, M917, M527, M973 and 
M918 respectively. The primer positions listed in Table 3 

35 are based on treating the first nucleotide of each of these 
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sequences as No. 1. 

BEST METHOD OF CARRYING OUT THE INVENTION 

5 In carrying out the methods of the invention with respect 
to the testing of particular sample types including samples 
from food, patients, animals and faeces the samples are 
prepared by routine techniques routinely used in the 
preparation of such samples for DNA based testing. The 
10 steps for testing the samples using particular nucleic acid 
molecules in assay formats such as Southern blots and PGR 
are performed tinder routinely determined conditions 
appropriate to the sample and the nucleic acid molecules. 

15 H antigen 

Materials and Methods 

1. Bacterial strains: 

There are 54 H types in E. coli [Ewing, W.H. : Edwards and 
Ewing's identification of the Enterobacteriaceae . , Elsevier 

20 Science Publishers, Amsterdam, The Netherlands, 1986]: note 
H antigens from 1 to 57 were listed and that 13, 22 and 57 
are not valid. The standard H type strains were obtained 
from the Institute of Medical and Veterinary Science, 
Adelaide, Australia. The primary stocks are hold at the 

25 Statens Serum Ins ti tut, Copenhagen, Denmark. 

The additional H7 strains used are listed in Table 1. 

2. Isolation of chromosomal DNA: 

Chromosomal DNA from all the 54 H type strains and the 
30 strains listed in Table 1 was isolated using the Promega 
Genomic isolation kit (Madison WI USA) . Each chromosomal 
DNA sample was checked by gel electrophoresis of the DNA 
and by PCR amplification of the jndh gene using 
oligonucleotides based on the E. coli K-12 mdh gene [Boyd, 
35 E.F., Nelson, K., Wang, F.-S., Whittam, T.S. and Selander, 
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R.K.: Molecular genetic basis of allelic polymorphism in 
malate dehydrogenase imdh) in natural populations of 
Escherichia coli and Salmonella enterica. Proc. Natl. Acad. 
Sci. USA 91 (1994) 1280-1284]. 

5 

3. PCR amplification of flagellin gene: 

Flagellin genes from different strains were first PCR 

amplified using one of the following four pairs of 
10 oligonucleotides : 

#1285 (5'-atggcacaagtcattaatac) and 

#1286 (5 • -ttaaccctgcagtagagaca) ; 

#1417 (5 ' -ctgatcactcaaaataatatcaac) and 

#1418 (5 * -ctgcggtacctggttggc) ; 
15 #1431 (5'-atggcacaagtcattaatacccaac) and 

#1432 (5 ' -ctaaccctgcagcagagaca) : 

#1575 (5'-gggtggaaacccaatacg) and 

#1576 ( 5 ' -gcgcatcaggcaatttgg) 

PCR reactions were carried out iinder the following 
20 conditions: denaturing, 94°C/30'; annealing, temperature 
varies (refer to Table 2)/30'; extension, 72<=*C/1'; 30 
cycles. The PCR product was purified using the Promega 
Wizard PCR purification kit (Madison WI USA) before being 
sequenced. 

25 The H36 and H53 strains gave two PCR bands using 

primer pairs #1431/#1432 and #1417/#1418 respectively, and 
were not sequenced. 

4. Sequencing of the flagellin genes: 

30 Each PCR product was first sequenced using the 

oligonucleotide primers used for the PCR amplification. 
Primers based on the obtained sequence were then used to 
sequence further, and this procedure was repeated until the 
entire PCR product was sequenced. 
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The sequencing reactions were performed using the 
DyeDeoxy Terminator Cycle Sequencing method (Applied 
Biosystems, CA, USA), and reaction products were analysed 
using fluorescent dye and an ABI377 automated sequencer 
(CA, USA) . 

Sequence data were processed and analysed using Staden 
programs [Sacchi CT, Zanella R C, Caugant D A, Frasch C E, 
Hidalgo N T, Milagres L Pessoa L L, Ramos S R, Camargo M 
C C and Melles C E A ''Emergence of a new clone of serogroup 
C Neisseria meningitidis in Sao Paulo, Brazil" J. Clin, 
Microbiol. 30 (1992) 1282-1286; 

Staden, R. : Automation of the con^uter handling of gel 
reading data produced by the shotgun method of DNA 
sequencing. Nucl. Acids Res. 10 (1982a) 4731-4751; 
Staden, R. : An interactive graphics program for comparing 
and aligning nucleic acid and amino acid sequences. Nucl. 
Acids Res. 10 (1982b) 2951-2961; 

Staden, R. : Computer methods to locate signals in nucleic 
acid sequences. Nucl. Acids Res. 12 (1984a) 505-519; 
Staden, R. : Graphic methods to determine the function of 
nucleic acid sequences. A summary of ANALYSEQ options. 
Nucl. Acids Res. 12 (1984b) 521-538; 
Staden, R. : The current status and portability of our 
sequence handling software. Nucl. Acids Res. 14 (1986) 217- 
231] . 

We were able to PGR amplify flagellin genes from H 
typing strains for H7, 23, 12, 51, 45, 49, 19, 9, 30, 32, 
26, 41, 15, 16, 20, 28, 46, 31, 14, 18, 6, 34, 48, 43, 10, 
52, and also from H7 strains ml004, m527, ml686, na211, 
ml328, m973, mll79, ml200, m917, and m918 using primers 
#1575 and #1576 which are based on sequences 51-34 bp 
upstream and 37-54 bp downstream of start and end of the E. 
coll K-12 flic gene respectively. Thus, the full sequence 
of the flagellin gene from these strains was obtained and 
the use of flanking sequence for primers makes it highly 
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likely that they are at the flic locus. 

For other strains, we were only able to amplify the 
flagellin gene using one or more of the other three pairs 
of primers, which are based on sequence within the flic 
5 gene, and thus only partial sequence was obtained. These 

amplicons may be of the fliC gene or one of the alternative 
flagellin genes. The flagellin gene sequences from H typing 
strains for H40, 8, 21, 47, 11, 27, 35, 2, 3, 24, 37, 50, 
4, 44, 38, 55, 29, 33, 5, and 56 obtained are lacking 18 

10 and 14 codons at 5* and 3' ends respectively. The flagellin 
gene sequence of H39 obtained using primers #1285/#1286 
lacks 18 cind 19 codons at 5* and 3' ends respectively. The 
flagellin gene sequence of H typing strains of H17, 25 and 
42 lack 23 and 21 codons at 5' and 3' ends respectively. 

15 The flagellin gene sequence of the H typing strain for H54 
lacks 23 and 12 codons at the 5' and 3' ends respectively. 
There is very little variation in the sequence at the two 
ends of flagellin genes and antigenic variation is due to 
variation in the central region of the gene. The absence of 

20 sequence for the ends of some of the flagellin genes is not 
important for the purpose of the present invention relating 
to the detection of antigenic variation by DNA sequence 
based means. 

The flic genes from H type strains of Hi, H7 and H12 
25 have been sequenced previously [Schoenhals, G. and 

Whitfield, C: Comparative £inalysis of flagellin sequences 
from Escherichia coli strains possessing serologically 
distinct flagellar filaments with a shared complex surface 
pattern. J. Bacteriol. 175 (1993) 5395-5402] and we did not 
3 0 sequence the gene from the HI strain. 

We have sequenced flic genes from a set of H7 strains 
with different O antigens, including that of fliC from the 
H7 typing strain as one of the set: we have found four 
differences from the pioblished H7 sequence (GenBank 
35 accession nximber L07388) which we believe are due to errors 
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in the published sequence. 

We have also re-sequenced the flic gene from the H12 
type strain, and have found one difference from the 
published H12 sequence (GenBsink accession number L07389) 
5 which we believe is due to an error in the published 
sequence. 

The flagellin genes from type strains H35 and H54 were 
also amplified using primers #1431/#1432, which are based 
on sequence within the fliC gene. Sequence data revealed 
10 that these two genes would be non- functional due to 

insertion sequence inserted in the middle of them. We have 
sequenced them to facilitate selection of primers for the 
functional flagellin genes. 

15 5. Comparison and alignment of the flagellin genes: 

Programs Pileup [Deverexax, J., Haeberli, P. and Smithies, 
0. : A comprehensive set of sequence analysis programs for 
the VAX. Nucl. Acids Res. 12 (1984) 387-395]and Multicomp 
[Reeves, P.R., Famell, L, and Lan, R. : MULTICOMP: a- 

20 program for preparing sequence data for phylogenetic 
analysis. CABIOS 10 (1994) 281-284] were used. 

The previously published sequence of HI (GenBank 
accession number L07387) was extracted from GenBank and 
used. Because we did not sequence H36 and H53 flagellin 

25 genes, we only compared 52 flagellin genes of H typing 
strains cind the fliC genes from the additional 10 H7 
strains. 

Among the H7 fliC genes, the percentage of DNA 
difference ranged from 0.0 to 2.39%. Some of the flagellin 

3d genes from different typing strains axe identical: those 
from H40 and H8 are identical as are those from HIS and 
H16. Some others are nearly identical: H21 and H47 (1.5% 
difference), H12 and Hi (2.6% difference), HIO and H50 
(0.3% difference), H38 and H55 (0.1% difference), H4, H44 

35 and H17 are very similar, the pairwise difference ranging 
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from 0.33% to 0.87%. 

In the cases where the flagellin gene from two type 
strains is near identical, we conclude that both genes code 
for flagellin of the same H specificity and that one or 
5 other strain has an additional locus which carries the 

functional gene, although the flagellin genes sequenced do 
not appear to be mutated. 

As discussed above, genes encoding some H antigens 
have been shown to be located at loci other than flic. H3, 

10 H36, H47, H53 have been shown to be at a locus called flkA, 
H44 and H55 at fllA, and H54 at flmA [Ratiner Y A 
(1998) 'New f lagellin-specifying genes in some Escherichia 
coli strains" J. Bacterid. 180 979-984]. However, these 
strains may carry a fliC in addition to flkA, fllA or flinA 

15 [Ratiner Y A (1998) "New f lagellin-specifying genes in some 

Escherichia coli strains" J. Bacteriol. 180 979-984]. 

The flagellin gene encoding H48 was previously 
sequenced from E. coli strain K-12 [Kuwajima G, Asaka J, 
Fujiwara T, Node K and Kondo E ^Nucleotide sequence of the 

20 hag gene encoding flagellin of Escherichia coli" J 

Bacteriol. 168 (1986) 1479-1483]. We have sequenced the 
flic gene from the H48 typing strain, and foxmd that it is 
identical to that from K-12 . 

The H54 gene is laiown to l>e at flmA [Ratiner Y A 

25 (1998) "New f lagellin-specifying genes in some Escherichia 

coli strains' J. Bacteriol. 180 979-984] 

and the finding of a non-functional presuitptive fliC locus 
in the H54 strain shows tlxat it is present but not 
expressed. However, we have not amplified and sequenced the 
30 functional flmA gene of this strain. The two bands from the 
H36 and H53 strains {lx>th using primers based on fliC 
sequence) are thought to l>e from the fliC and flkA loci, 
but the bands were not purified and have not been 
sequenced- The fliC genes of H21 and H47 strains share 
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98.5% identity at DNA level and those of H38 and H55 
strains share 99.9% identity at DNA level: as H47 and H55 
map at flkA and fllA respectively we believe we have 
sequenced the flic genes encoding the H21 and H38 antigenic 
5 specificities respectively and that these genes are present 
but not expressed in the H47 and H55 strains. 

The genes encoding H36, H47 and H53 (at flkA), H44 and 
H55 (at fllA) and H54 (at flmA) are yet to be sequenced as 
those that were gave either 2 bands so the gene was not 

10 sequenced (H36 and H53) , a nonfunctional gene (H54 and H35) 
or sequences obtained from the typing strains for these 
specificities are very similar to those of other strains 
(H47, H44 cind H55) and we suspect that we have sequenced a 
non functional flic gene in these cases. Also for other 

15 pairs with highly similar flagellin genes,, as shown by our 
sequencing comparison, we do not know which H specificity 
we have sequenced leaving one still to be sequenced (H40 or 
H8, H15 or H16, H12 or HI, HIO or H50, and H4 or H17) . 

Using the 42 unique sequences and the sequences from 

20 the two non-functional flagellin genes (from H typing 

strains H35 cind H54) (see Table 3) we have been able to 
determine antigen specific primers for each of the H 
antigen specificities and thereby show that it is 
practicable to detect E.coli strains carrying specific H 

25 antigens without false positives from strains of other H 
types. There is no reason to expect that the addition of 
12 sequences to the 42 unique sequences obtained will 
affect the general conclusion, as unlike previous reports, 
our study covers flagellin sequences for a substantial 

30 majority of known E. coli H antigen specificities. 

Our study of 11 H7 genes from strains of eight 
different O antigens shows limited variation and was such 
that the variation within genes for H cintigens will not 
affect the ability to select antigen specific primers. 0:H 

35 combinations in general define a strain and as some of the 
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strains thus defined were quite distant from each other in 
a study by Whittain [Whittam T S, wolfe M L, Wachsmuth I K, 
Orskov I and Wilson R A '^Clonal relationships among 
Escherichia coli strains that cause hemorrhagic colitis and 
5 infantile diarrhea' Infect. Immun, 61 (1993) 1619-1629] the 
variation we observe is thought to represent that present 
in H7 genes. However, there is a low possibility that 
primers chosen without Icnowledge of the variation within 
genes of each H specificity could fail to give positive 

10 results with some isolates due to chance choice of primers 
which cover a base or bases which contribute to this low 
level vciriation- The variation within the H7 genes is in 
the normal range for variation within a gene in E. coli and 
if this possibility did occur it would be easy to use an 

15 alternate primer pair. 

There are 54 known H antigens for E. coli and of these 
there are 12 H antigen specificities for which we do not as 
yet have sequence- It will be easy to determine these 
sequences and determine primer pairs specific for these H 

20 antigens by comparing these sequences with the 44 obtained 
sequences (see Table 3), and also modify the primers 
selected for any H antigen for which we already know the 
sequence in the unlikely event that there is a possibility 
of false positives with the primers selected. 

25 The sequences for the remaining H antigens can be 

obtained in one of the following ways: 

1. where we have two bands by PGR (H36 and H53 typing 

strains), we purify each and sequence, and also clone 

30 each into a strain mutated in its fliC gene cind 

determine the H antigen expressed by use of specific 
sera. In this way a specific sequence can be related 
to an H antigen specificity. The other band which 
represents an H antigen gene for a different 

35 specificity is expected to include a mutant gene or a 
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gene similar to one of those already sequenced, but if 
not may represent a new specificity for which primer 
pairs could be selected. It may be difficult to obtain 
expression of flagellin genes when cloned from E. coli 
due to cloning together with regulatory sequences 
which prevent expression. This is easily avoided by 
cloning the major segment of the gene into a 
functioning fliC gene to replace the equivalent 
segment of that gene, using standard site directed 
mutagenesis to give suitable restriction sites within 
the cloned gene and incorporating those restriction 
sites into primers used to amplify the major segment 
of the gene to be studied to facilitate the cloning. 

Where two or three strains have the same flagellin 
gene sequence, the genes are cloned as above and the H 
antigen specificity represented by this sequence is 
determined. This identifies the strain in which the 
gene is expressed and also those strains for which we 
have sequenced a gene which is not being expressed. 
We then clone the gene for the antigen expressed in 
these strains by making a bank of plasmid clones using 
chromosomal DNA and select for a clone which is 
expressing an H antigen different from the one 
represented by the known sequence. This can be done 
by taking advantage of the fact that the H antigen is 
on flagellin, the protein of the bacterial f lagellum 
used for movement of the bacteria. In the presence of 
antibodies specific to that f lagellum the bacteria 
cannot swim. For selection the clones are placed in a • 
situation in which mobile cells ccin swim away from the 
others and be collected. There cure many versions of 
these techniques and any could be used. One version 
is to place the bacteria on a nutrient agar plate with 
reduced agar content such that bacteria can swim away 
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from the site of inoculation. This is easily seen as 
growth on the plate and a sample of the bacteria which 
are motile can be recovered and cultivated. In this 
way bacteria carrying cloned H antigen genes can be 
selected. If the medium in the plate has antibody 
added to it only bacteria which express an H antigen 
different to that recognised by the antiserum will be 
able to swim. Specifically if the antiserum used is 
specific for the H antigen expressed by the gene for 
which we have sequence, only clones which express a 
different H antigen, such as those expressing the H 
antigen esqjressed by the H typing strains used to make 
the plasmid, will be selected. Once the clone is 
obtained, the H antigen gene can be sequenced. 

Our work has shown that there are at least 8 cases 
where the H antigen typing strains carry two H antigen 
genes which appear to be complete and have the potential to 
function. However, while E. coli does not (in general) 
20 have a capacity to express more than one flagellin gene, it 
is striking that there are several loci for flagellin genes 
[Ratiner Y A (1998) "New f lagellin-specifying genes in some 
Escherichia coli strains* J. Bacteriol, 180 979-984]. 
Several of the pairs of H typing strains with identical 
25 sequence do not include any of the H antigen types shown by 
Ratiner [Ratiner Y A (1998) "New f lagellin-specifying genes 
in some Escherichia coli strains' J. Bacteriol. 180 979- 
984] to map other than at fliC although these predominate. 
This suggests that there are additional cases where an 
30 expressed gene is not the only flagellin gene present. 

However the fact that flagellin gene sequences for many of 
the typing strains for H antigens found by Ratiner [Ratiner 
Y A (1998) *New f lagellin-specifying genes in some 
Escherichia coli strains' J. Bacteriol. 180 979-984] to map 
35 away from fliC are among those near identical to others, 
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indicates that the phenomenon is of limited extent. 
Nonetheless it remains possible even where only one gene 
has been obtained by PGR, that it is one of a pair of 
flagellin genes, the other not being amplified by the 
5 primers used, and further that it is the one not amplified 
which is expressing the H antigen of the strain. It will 
therefore be necessary to clone as described above each of 
the flagellin genes we have sequenced and confirm that it 
expresses the expected antigen to ensure that the invention 

10 give results corresponding to those of the traditional 

sero typing scheme* In the event that it does not, the gene 
for the type antigen can be cloned and sequenced by the 
means described above. 

The 11 H7 flic sequences fell into three groups, one 

15 comprising the genes from the 0157 :H7 and 055 :H7 strains, 
which were identical, as expected given the proposed 
relationship between the clones. It has been shown that E. 
coli 0157 :H7 and 055 :H7 clones are closely related [Whittam 
T S, wolfe M L, Wachsmuth I K, Orskov I and Wilson R A 

20 "^Clonal relationships among Escherichia coli strains that 
cause hemorrhagic colitis and infantile diarrhea" Infect. 
Immun. 61 (1993) 1619-1629] thus it was expected that the 
H7 flic genes from 0157 and 055 would be identical. Among 
the H7 flic sequences, we can identify primers specific to 

25 the H7 flic gene for each of the three H7 groups. Two of 
these primers in combination with an H7 specific primer 
gave two primer pairs specific for the H7 gene of from the 
0157 :H7 and 055 :H7 clones. 

30 6, Specific oligonucleotide primers for each of 

the 42 H types 

Two oligonucleotide primers were chosen based on each of 
the 42 sequences. None of them had more than 85% identity 
with any other of 61 flagellin gene sequences. Thus, 
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these primers are specific for each H type. These primers 
are listed in Table 3 . 

The flic gene of the H54 typing strain is a mutated 
gene. It has an insertion sequence (IS1222) inserted into 
a normal flagellin gene of H21. Thus, primers for H21 
would amplify a fragment of different size in H54 . We 
also provide 2 primers based on the insertion sequence 
(see H54 column in Table 3), the use of one of them in 
combination with one of the H21 primers will generate a 
PGR band only in H54, and this will also differentiate 
H54 from H21. 

The flic gene of H35 type strain is also a mutated 
gene. It has an insertion sequence (ISl) inserted into a 
normal flagellin gene of Hll. Thus, primers for Hll 
would amplify a fragment of different size in H35. We 
also provide 2 primers based on the insertion sequence 
(see H35 column in Table 3), the use of one of them in 
combination with one of the Hll primers will generate a 
PGR band only in H35, and this will also differentiate 
H35 from Hll. 

7. Testing of the H7 specific oligonucleotide primers 
Primer pair #1806/#1809 (see Table 3) was used to 
carry out PGR on chromosomal DNA samples of all the 54 H 
type strains and the H7 strains listed in Table 1. PGR 
reactions were carried out under the following 
conditions: denaturing, 94°G/30*; annealing, 58°C/30'; 
extension, 72«>C/1'; 30 cycles. PGR reaction was carried 
out in an voliame of 50ul for each of the chromosomal 
sample. After the PGR reaction, 5^il PGR product from each 
sample was run on an agarose gel to check for amplified 
DNA- 

Primer pairs #1806/#1809 produced a band of 
predicted size with all the 11 strains expressing H7, 
but gave no band with other H type strains. Thus, 
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these primers are H7 specific. 

8, Testing of oligonucleotide primers specific to H7 of 
0157 and 055: 

5 Based on a comparison of the fliC sequences of 11 

different H7 strains, we have identified two 
oligonucleotides [#1696 (5 ' -GGCCTGACTCAGGCGGCC) at 
positions 178 to 195 in M527 and #1697 (5'- 
GAGTTACCGGCCTGCTGA) positions 1700-1683 in M527] which 

10 are unique to H7 of 0157 and 055. Although not identical 

to any parts of the fliC sequences of any other H7 
strains, these two primers are identical or have high 
level similarity to fliC genes of some other H types. 
However a combination of one of these primers with one of 

15 the H7 specific primers can give specificity for H7 of 

0157 and 055 E. coli. 

Primer pairs #1696/#1809 and #1697/#1806 were used 
to carry out PGR on chromosomal DNA samples of all the H 
type strains and the H7 strains listed in Table 1. PGR 

20 reactions were carried out under the following 

conditions: denaturing, 94*^C/30'; annealing, 61°C/30* 
(for #1696/#1809) or 60°C/30 ' (for#1697/#1806) ; extension, 
72°C/1*; 30 cycles. PGR reaction was carried out in an 
volume of 50|il for each of the chromosomal samples. After 

25 the PGR reaction, 5|il PGR product from each sample was 

run on an agarose gel to check for ait^lified DNA. 

Both primer pairs produced a band of predicted size 
with both of the H7:0157 strains (strains M1004 and M527, 
see Table 1), and the H7:055 strain (strain M1686, see 

30 Table 1), gave no band with other strains. Thus, these 

two pairs of primers are specific to H7 genes of 0157 eind 
055 coli strains. 

O antigen 
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Materials and Methods -part 1 

The experimental procedures for the isolation and 
characterisation of the E. coli 0111 O antigen gene 
cluster (position 3,021-9,981) are according to Bast in • 
5 D.A., et al. 1991 **Molecular cloning and expression in 

Escherichia coli K-12 of the rfb gene cluster determining 
the 0 antigen of an E. coli 0111 strain". Mol. Microbiol, 
5:9 2223-2231 and Bastin D.A. and Reeves, P.R. 1995 
^'Sequence and analysis of the 0 antigen gene (rfib) cluster 
10 of Escherichia coli 0111'. Gene 164: 17-23, 

A. Bacterial strains and growth media 

Bacteria were grown in Luria broth supplemented as 
required. 

B. Cosmids and phage 

15 Cosmids in the host strain x2819 were repackaged in 

vivo. Cells were grown in 250mL flasks containing 30mL of 
culture, with moderate shaking at 30°C to an optical 
density of 0.3 at 580 nm. The defective lambda prophage 
was induced by heating in a water bath at 45°C for 15min 

20 followed by an inciibation at 37°C with vigorous shaking 
for 2hr. Cells were then lysed by the addition of 0.3mL 
chloroform and shaking for a further lOmin. Cell debris 
were removed from ImL of lysate by a 5min spin in a 
microcentrifuge, and the supernatant removed to a fresh 

25 microfuge t\ibe. One drop of chloroform was added then 
shaken vigorously through the tube contents. 

C. DNA preparation 

Chromosomal DNA was prepared from bacteria grown 
overnight at 37 °C in a volume of 30mL of Luria broth. 

30 After harvesting by centrifugation, cells were washed and 
resuspended in lOmL of 50mMTris-HCl pH 8.0. EDTA was 
added and the mixture incxibated for 20min. Then lysozyme 
was added and incubation continued for a further lOmin. 
Proteinase SDS, and ribonuclease were then added and 

35 the mixture incubated for up to 2hr for lysis to occur. 
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All incubations were at 37°C. The mixture was then heated 
to 65°C and extracted once with 8mL of phenol at the same 
temperature. The mixture was extracted once with 5mL of 
phenol/chloroform/iso-amyl alcohol at 4°C. Residual 
5 phenol was removed by two ether extractions. DNA was 

precipitated with 2 vols, of ethanol at i^C, spooled and 
washed in 70% ethanol, resuspended in l-2mL of TE and 
dialysed. Plasmid and cosmid DNA was prepared by a 
modification of the Birnboim and Doly method [Birnboim, H. 

10 C, and Doly, J. (1979) "A rapid alkaline extraction 

procedure for screening recombinant plasmid DNA" Nucl. 
Acid Res, 7:1513-1523]. The voliame of culture was lOmL 
and the lysate was extracted with phenol/chloroform/iso- 
amyl alcohol before precipitation with isopropanol. 

15 Plasmid DNA to be used as vector was isolated on a 

continuous caesium chloride gradient following alkaline 
lysis of cells grown in IL of culture. 

D. Enzymes and buffers. 

Restriction endonucleases and DNA T4 ligase were 
20 purchased from Boehringer Mannheim (Castle Hill, NSW, 
Australia) or Pharmacia LKB (Melbourne, VIC Australia). 
Restriction enzymes were used in the recommended 
commercial buffer. 

E. Construction of a gene bank. 

25 Individual aliquots of M92 chromosomal DNA (strain 

Stoke W, from Statens Serum Institut, 5 Artillerivej , 2300 
Copenhagen S, Denmark) were partially digested with 0.2U 
5au3Al for l-15mins. Aliquots giving the greatest 
proportion of fragments in the size range of approximately 

30 40-50kb were selected and ligated to vector pPR691 
previously digested with BairiHl and PvuII. Ligation 
mixtures were packaged in vitro with packaging extract. 
The host strain for transduction was x2819 and 
recombinants were selected with kaneunycin. 
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F. Serological procedures. 

Colonies were screened for the presence of the 0111 
antigen by immunoblotting . Colonies were grown overnight, 
up to 100 per plate then transferred to nitrocellulose 
5 discs and lysed with 0.5N HCl. Tween 20 was added to TBS 
at 0.05% final concentration for blocking, incubating and 
washing steps. Primary antibody was E. coli O group 111 
antiserum, diluted 1:800. The secondary antibody was goat 
anti-rabbit IgG labelled with horseradish peroxidase 
10 diluted 1:5000, The staining substrate was 4-chloro-l- 
napthol. Slide agglutination was performed according to 
the standard procedure. 

G. Recombinant DNA methods. 

Restriction mapping was based on a combination of 
15 standard methods including single and double digests and 

sub-cloning. Deletion derivatives of entire cosmids were 
produced as follows: aliquots of 1.8mg of cosmid DNA were 
digested in a volume of 20ml with 0.25U of restriction 
enzyme for 5-80min. One half of each aliquot was used to 
20 check the degree of digestion on an agarose gel. The 

sample which appeared to give a representative range of 
fragments was ligated at 4^C overnight and transformed by 
the CaCl2 method into JM109. Selected plasmids were 
transformed into sfl74 by the same method. P4657 was 
25 transformed with pPRl244 by electroporation. 

H. DNA hybridisation 

Probe DNA was extracted from agarose gels by 
electroelution and was nick- translated using [a-32P] -dCTP. 
Chromosomal or plasmid DNA was electrophoresed in 0.8% 

30 agarose and transferred to a nitrocellulose membreme. The 
hybridisation and pre-hybridisation buffers contained 
either 30% or 50% formamide for low and high stringency 
probing respectively, Incvibation temperatures were 42°C 
and 37 ^C for pre-hybridisation and hybridisation 

35 respectively. Low stringency washing of filters consisted 
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of 3 X 20min washes in 2 x SSC and 0.1% SDS. High- 
stringency washing consisted of 3 x 5min washes in 2 x SSC 
and 0.1% SDS at room temperature, a Ihr wash in 1 x SSC 
and 0.1% SDS at SB^C and 15min wash in 0.1 x SSC and 0.1% 
5 SDS at 58°C. 

I. Nucleotide sequencing of coli 0111 0 antigen gene 
cluster {position 3,021-9,981) 

Nucleotide sequencing was performed using an ABI 373 
automated sequencer {CA, USA) . The region between map 

10 positions 3.30 and 7.90 was .sequenced using 

uni -directional exonuclease III digestion of deletion 
families made in PT7T3190 from clones pPR1270 and pPR1272. 
Gaps were filled largely by cloning of selected fragments 
into M13mpl8 or Ml3mpl9 . The region from map positions 

15 7.90-10-2 was sequenced from restriction fragments in 

M13mpl8 or M13mpl9. Remaining gaps in both the regions 
were filled by priming from synthetic oligonucleotides 
complementary to determined positions along the sequence, 
using a single stranded DNA template in M13 or phagemid, 

20 The oligonucleotides were designed after analysing the 
adjacent sequence. All sequencing was performed by the 
chain termination method. Sequences were aligned using SAP 
[Staden, R., 1982 ^Automation of the computer heindling of 
gel reading data produced by the shotgun method of DNA 

25 sequencing', l^uc. Acid Res. 10: 4731-4751; Staden, R., 

1986 *The current status and portability of our sequence 
handling software*. Nuc. Acid Res, 14: 217-231], The 
program NIP [Staden, R. 1982 ""An interactive graphics 
program for comparing and aligning nucleic acid and amino 

30 acid sequence*. Nuc. Acid Res, 10: 2951-2961] was used to 
find open reading frames and trcuislate them into proteins. 
J. Isolation of clones carrying E. coli 0111 O antigen 
gene cluster 

The E, coli 0 antigen gene cluster was isolated 

35 according to the method of Bastin D.A., et al. [1991 

C:\WINDOWS\DESKTOP\My Brief case\212doc\H antigen Bpeci coli only. doc 



36 

**Molecular cloning and expression in Escherichia coli K-12 
of the rfb gene cluster determining the 0 antigen of an E. 
coli 0111 strain*. Mol, Microbiol, 5(9), 2223-2231]. 
Cosmid gene banks of M92 chromosomal DNA were established 
5 in the in vivo packaging strain x2819. From the genomic 
bank, 3.3 x 10"^ colonies were screened with E,coli 0111 
antiserum using an immuno -blot ting procedure: 5 colonies 
(PPR1054, pPR1055, pPRl056, pPR1058 and pPR1287) were 
positive. The cosmids from these strains were packaged in 

10 vivo into lambda particles and transduced into the E. coli 
deletion mutant Sfl74 which lacks all 0 antigen genes. In 
this host strain, all plasmids gave positive agglutination 
with 0111 antiserum. An Eco Rl restriction map of the 5 
independent cosmids showed that they have a region of 

15 approximately 11.5 kb in common (Figure 1) . Cosmid 
pPRlOSS included sufficient flanking DNA to identify 
several chromosomal markers linked to 0 antigen gene 
cluster and was selected for analysis of the 0 antigen 
gene cluster region. 

20 K, Restriction mapping of cosmid pPR1058 

Cosmid pPR1058 was mapped in two stages. A 
preliminary map was constructed first, cuid then the region 
between map positions 0.00 and 23.10 was mapped in detail, 
since it was shown to be sufficient for 0111 antigen 

25 expression. Restriction sites for both stages are shown 
in Figure 2 . The region common to the five cosmid clones 
was between map positions 1.35 and 12.95 of pPR1058, 

To locate the O antigen gene cluster within pPR1058, 
pPRl058 cosmid was probed with DNA probes covering 0 

30 antigen gene cluster flanking regions from 5. enterica LT2 
and E.coli K-12. Capsular polysaccharide (cps) genes lie 
upstream of O antigen gene cluster while the gluconate 
dehydrogenase (gnd) gene and the histidine (his) operon 
are downstream, the latter being further from the O 

35 antigen gene cluster. The probes used were pPR472 
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(3.35kb), carrying the gnd gene of LT2, pPR685 (5.3kb) 
carrying two genes of the cps cluster, cpsB and cpsG of 
LT2, and K350 (16.5kb) carrying all of the his operon of 
K-12. Probes hybridised as follows: pPR472 hybridised to 
5 1.55kb and 3.5 kb (including 2.7 kb of vector) fragments 
of Pstl and Hindlll double digests of pPR1246 (a 
Hindlll/EcoRl subclone derived from pPR1058, Figure 2), 
which could be located at map positions 12.95-15.1; pPR685 
hybridised to a 4.4 kb EcoRl fragment of pPRl058 

10 (including 1.3 kb of vector) located at map position 0.00- 

3.05; and K350 hybridised with a 32kb EcoRl fragment of 
pPR1058 (including 4.0kb of vector), located at map 
position 17.30-45.90. Subclones containing the presumed 
gnd region complemented a gndTedd' strain GB23152. On 

15 gluconate bromothymol blue plates, pPR1244 and pPRl292 in 
this host strain gave the green colonies expected of a 
gnd*edd~ genotype. The his* phenotype was restored by 
plasmid pPR1058 in the his deletion strain Sfl74 on 
minimal medium plates, showing that the plasmid carries 

20 the entire his operon. 

It is likely that the 0 antigen gene cluster region 
lies between gnd and cps, as in other E. coli and S. 
enterica strains, and hence between the approximate map 
positions 3.05 and 12.95. To confirm this, deletion 

25 derivatives of pPR1058 were made as follows: first, 
pPR1058 was partially digested with Hindlll and self 
ligated. Transformants were selected for kanamycin 
resistance cuid screened for expression of 0111 antigen. 
Two colonies gave a positive reaction. EcoRl digestion 

30 showed that the two colonies hosted identical plasmids, 

one of which was designated pPR1230, with an insert which 
extended from map positions 0.00 to 23.10. Second pPR1058 
was digested with Sail and partially digested with Xhol 
and the compatible ends were re-ligated. Transformants 

35 were selected with kanamycin and screened for 0111 antigen 
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expression. Plasmid DNA of 8 positively reacting clones 
was checked using EcoRl and Xhol digestion and appeared to 
be identical. The cosmid of one was designated pPR1231. 
The insert of pPR1231 contained the DNA region between map 
5 positions 0.00 and 15.10. Third, pPR1231 was partially 
digested with Xhol, self-ligated, and transf ormants 
selected on spectinomycin/ streptomycin plates. Clones 
were screened for kanamycin sensitivity and of 10 
selected, all had the DNA region from the Xhol site in the 

10 vector to the Xhol site at position 4.00 deleted. These 
clones did not express the 0111 cuntigen, showing that the 
Xhol site at position 4.00 is within the 0 antigen gene 
cluster. One clone was selected and named pPR1288. 
Plasmids pPR1230, pPR1231, and pPRl288 are shown in Figure 

15 2. 

L. Analysis of the E. coli 0111 O antigen gene 

cluster (position 3,021-9,981) nucleotide sequence data 

Bastin and Reeves [1995 ^'Sequence eind analysis of the 
O antigen gene (rfjb) cluster of Escherichia coli 0111". Gene 

20 164: 17-23] partially characterised the E.coli 0111 O 

antigen gene cluster by sequencing a fragment from map 
position 3,021-9,981. Figure 3 shows the gene 
organisation of position 3,021-9,981 of E. coli 0111 0 
antigen gene cluster. orf3 and orfS have high level amino 

25 acid identity with wcaH and wcaG (46.3% and 37.2% 

respectively) , and are likely to be similar in function to 
sugar biosynthetic pathway genes in the E. coli K-12 
colanic gene cluster. orf4 and orfS show high levels of 
amino acid homology to manC and manB genes respectively. 

30 orf7 shows high level homology with rfbH which is an 

abequose pathway gene. orf8 encodes a protein with 12 
transmembrane segments and has similarity in secondary 
structure to other wzx genes and is likely therefore to be 
the O cintigen flippase gene. 
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Materials and Methods-part 2 

A. Nucleotide sequencing of 1 to 3,020 and 9,982 to 
14,516 of the E. coli 0111 0 antigen gene cluster 

5 The sub clones which contained novel nucleotide 

sequences, pPR1231 (map position 0 and 1,510), pPR1237 
(map position -300 to 2,744), pPR1239 (map position 2,744 
to 4,168), PPR1245 (map position 9,736 to 12,007) and 
PPR1246 (map position 12,007 to 15,300) (Figure 2), were 

10 characterised as follows: the distal ends of the inserts 
of PPR1237, PPR1239 and pPR1245 were sequenced using the 
M13 forward and reverse primers located in the vector. 
PGR walking was carried out to sequence further into each 
insert using primers based on the sequence data and the 

15 primers were tagged with M13 forward or reverse primer 

sequences for sequencing. This PGR walking procedure was 
repeated until the entire insert was sequenced. pPR1246 
was characterised from position 12,007 to 14,516. The DNA 
of these sub clones was sequenced in both directions. The 

20 sequencing reactions were performed using the dideoxy 

termination method and thermocycling and reaction products 
were analysed using fluorescent dye and an ABI automated 
sequencer (CA, USA) . 

B. Analysis of the E, coli 0111 O antigen gene cluster 
25 (positions 1 to 3,020 and 9,982 to 14,516 of Figure 5) 

nucleotide sequence data 

The gene organisation of regions of E. coli 0111 O 
cintigen gene cluster which were not characterised by 
Bastin and Reeves [1995 '^Sequence and analysis of the O 

30 cintigen gene (rfb) cluster of Escherichia coli 0111." Gene 
164: 17-23] , (positions 1 to 3,020 and 9,982 to 14,516) is 
shown in Figure 3 . There are two open reading frames in 
region 1. Four open reading frames are predicted in 
region 2. The position of each gene is listed in Table 9. 

35 The deduced amino acid sequence of orfl {wbdH) shares 
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about 64% similarity with that of the rfp gene of Shigella 
dysenteriae, Rfp and WbdH have very similar 
hydrophobicity plots and both have a very convincing 
predicted transmembrane segment in a corresponding 
5 position, rfp is a galactosyl transferase involved in the 
synthesis of LPS core, thus wbdH is likely to be a 
galactosyl transferase gene. orf2 has 85.7% identity at 
amino acid level to the gmd gene identified in the E. coli 
K-12 colanic acid gene cluster and is likely to be a gmd 

10 gene. orf9 encodes a protein with 10 predicted 

transmembrane segments and a large cytoplasmic loop. 
This inner membrane topology is a characteristic feature 
of all known 0 antigen polymerases thus it is likely that 
orf9 encodes an 0 antigen polymerase gene, wzy. orflO 

15 (wJbdL) has a deduced amino acid sequence with low homology 

with Lsi2 of Neisseria gonorrhoeae, Lsi2 is responsible 
for adding GlcNAc to galactose in the synthesis of 
lipooligosaccharide. Thus it is likely that wbdL is 
either a colitose or glucose transferase gene, orfll 

20 {wbdM) shares high level nucleotide and amino acid 

similarity with TrsE of Yersinia enterocolitica. TrsE is 
a putative sugar transferase thus it is likely that wbdM 
encodes the colitose or glucose transferase. 

In sximmary three putative transferase genes and cin 0 

25 antigen polymerase gene were identified at map position 1 
to 3,020 and 9,982 to 14,516 of B, coli 0111 0 antigen 
gene cluster. A search of GenBcink has shown that there 
are no genes with significant similarity at the nucleotide 
sequence level for two of the three putative transferase 

30 genes or the polymerase gene. Figure 5 provides the 
nucleotide sequence of the 0111 antigen gene cluster. 

Materials and Methods-part 3 

A. PGR anplif ication of 0157 antigen gene cluster from 
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an E. coll 0157 :H7 strain (Strain C664-1992, from Statens 
Serum Institute 5 Artillerivej , 2300, Copenhagen S, 
Denmark) 

E. coli 0157 O antigen gene cluster was amplified by 
5 using long PGR [Cheng et al. 1994, **Effective 

amplification of long targets from cloned inserts and 
human and genomic DNA' P.N.A.S. USA 91: 5695-569] with one 
primer (primer #412: att ggt age tgt aag cca agg gcg gta 
gcg t) based on the JumpStart sequence usually found in 

10 the promoter region of O antigen gene clusters [Hobbs, et 

ai. 1994 "^The JumpStart sequence: a 39 bp element common 
to several polysaccharide gene clusters'" Mol. Microbiol. 
12: 855-856], and another primer #482 (cac tgc cat acc gac 
gac gcc gat ctg ttg ctt gg) based on the gnd gene usually 

15 found downstream of the 0 antigen gene cluster. Long PCR 

was carried out using the Expand Long Template PCR System 
from Boehringer Mannheim (Castle Hill NSW Australia) , and 
products, 14 ]cb in length, from several reactions were 
combined and purified using the Promega Wizard PCR preps 

20 DNA purification System (Madison WI USA) , The PCR product 
was then extracted with phenol and twice with ether, 
precipitated with 70% ethanol, and resuspended in 40mL of 
water . 

B. Construction of a random DNase I banlc: 
25 Two aliquots containing about 150ng of DNA each were 

subjected to DNase I digestion using the Novagen DNase I 
Shotgun Cleavage (Madison WI USA) with a modified protocol 
as described. Each aliquot was diluted into 45ml of 0.05M 
Tris -HCl (pH7.5), 0.05mg/mL BSA and lOmM MnCl2. 5mL of 
30 1:3000 or 1:4500 dilution of DNasel (Novagen) (Madison WI 

USA) in the same buffer was added into each tube 
respectively and 10ml of stop buffer (lOOiriM EDTA) , 30% 
glycerol, 0.5% Orange G, 0.075% xylene and cyanol 
(Novagen) (Madison WI USA) was added after incxxbation at 
35 15°C for 5 min. The DNA from the two DNasel reaction 
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tubes were then combined and fractionated on a 0.8% LMT 
agarose gel, and the gel segment with DNA of about Ikb in 
size (about l.SmL agarose) was excised. DNA was extracted 
from agarose using Promega Wizard PGR Preps DNA 
5 Purification (Madison WI USA) and resuspended in 200 mL 

water, before being extracted with phenol and twice with 
ether, and precipitated. The DNA was then resuspended in 
17.25 mL water and subjected to T4 DNA polymerase repair 
and single dA tailing using the Novagen Single dA Tailing 

10 Kit (Madison WI USA) . The reaction product (85ml 
containing about 8ng DNA) was then extracted with 
chloroform: isoamyl alcohol (24:1) once and ligated to 3x 
10"-^ pmol pGEM-T (Promega) (Madison WI USA) in a total 
volume of lOOmL. Ligation was carried out overnight at 

15 4°C and the ligated DNA was precipitated and resuspended 

in 20mL water before being electroporated into E. coli 
strain JM109 and plated out on BCIG-IPTG plates to give a 
bank. 

C . Sequencing 

20 DNA templates from clones of the bank were prepared 

for sequencing using the 96-well format plasmid DNA 
miniprep kit from Advanced Genetic Technologies Corp 
(Gaithersburg MD USA) The inserts of these clones were 
sequenced from one or both ends using the standard M13 

25 sequencing primer sites located in the pGEM-T vector. 
Sequencing was carried out on an ABI377 automated 
sequencer (CA USA) as described above, after carrying out 
the sequencing reaction on an ABI Catalyst (CA USA) . 
Sequence gaps and areas of inadequate coverage were PGR 

30 amplified directly from 0157 chromosomal DNA using primers 
based on the already obtained sequencing data and 
sequenced using the standard M13 sequencing primer sites 
attached to the PGR primers. 

D. Analysis of the E. coli 0157 O antigen gene cluster 
35 nucleotide sequence data 
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Sequence data were processed and analysed using the 
Staden programs [Staden, R., 1982 "Automation of the 
computer handling of gel reading data produced by the 
shotgun method of DNA sequencing.' Nuc. Acid Res. 10: 
5 4731-4751; Staden, R., 1986 ^*The current status and 

portability of our sequence handling software". Nuc. Acid 
Res. 14: 217-231; Staden, R. 1982 "An interactive graphics 
program for comparing and aligning nucleic acid and amino 
acid sequence*. Nuc. Acid Res. 10: 2951-2961]. Figure 4 

10 shows the structure of E. coli 0157 0 antigen gene 

cluster. Twelve open reading frames were predicted from 
the sequence data, and the nucleotide and amino acid 
sequences of all these genes were then used to search the 
GenBank database for indication of possible function and 

15 specificity of these genes. The position of each gene is 
listed in Table 9. The nucleotide sequence is presented 
in Figure 6. 

orfs 10 and 11 showed high level identity to manC and 
manB and were named manC and inanB respectively. orf7 

20 showed 89% identity (at amino acid level) to the gmd gene 
of the E. coli colanic acid capsule gene cluster 
(Stevenson G., K. et ai. 1996 "Organisation of the 
Escherichia coli K-12 gene cluster responsible for 
production of the extracellular polysaccharide colanic 

25 acid*. J. Bacteriol, 178:4885-4893) and was named gmd. 

oris showed 79% and 69% identity (at amino acid level) 
respectively to wcaG of the E. coli colanic acid capsule 
gene cluster and to whcJ (or f 14. 8) gene of. the Yersinia 
enterocolitica 08 0 antigen gene cluster (Zhang, L. et al. 

30 1997 "Molecular and chemical characterization of the 

lipopolysaccharide 0-antigen and its role in the virulence 
of y. enterocolitica serotype OS'.Mol. Microbiol. 23:63- 
76) - Colanic acid and the Yersinia 08 O antigen both 
contain fucose as does the 0157 O antigen. There are two 
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enzymatic steps required for GDP-L-fucose synthesis from 
GDP-4-keto-6-deoxy-D-mannose, the product of the gmd gene 
product. However, it has been shown recently (Tonetti, M 
et al. 1996 Synthesis of GDP-L-fucose by the human FX 
5 protein J, Biol. Chem. 271:27274-27279) that the human FX 
protein has "significant homology" with the wcaG gene 
(referred to as Yefb in that paper) , and that the FX 
protein carries out both reactions to convert GDP-4-keto- 
6-deoxy-D-mannose to GDP-L-fucose. We believe that this 

10 makes a very strong case for orf8 carrying out these two 

steps and propose to name the gene fcl. In support of the 
one enzyme carrying out both functions is the observation 
that there are no genes other than manB, manC, gmd and fcl 
with similar levels of similarity between the three 

15 bacterial gene clusters for fucose containing structures. 
orfS is very similar to wbeE (rfbE) of Vibrio 
cholerae 01, which is thought to be the perosamine 
synthetase, which converts GDP-4-keto-6-deoxy-D-mannose to 
GDP-perosamine (Stroeher, U.H et al. 1995 putative 

20 pathway for perosamine biosynthesis is the first function 
encoded within the rfb region of Vibrio cholerae'' 01, Gene 
166: 33-42). V. cholerae 01 and E. coli 0157 0 antigens 
contain perosamine and N-acetyl -perosamine respectively. 
The V. cholerae 01 manA, manB, gmd and wbeE genes are the 

25 only genes of the V. cholerae 01 gene cluster with 

significcint similarity to genes of the coli 0157 gene 
cluster and we believe that our observations both confirm 
the prediction made for the function of wbe of V. 
cholerae, and show that orJ^S of the 0157 gene cluster 

3 0 encodes GDP-perosamine synthetase. orfS is therefore 

named per. orfS plus about lOObp of the upstream region 
(postion 4022 -53 08) was previously sequenced by Bilge, S.S. 
et al. [1996 '^Role of the Escherichia coli 0157 -H7 0 side 
chain in adherence and analysis of ein rfb locus' . Infect . 
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Immun. 64:4795-4801]. 

orfl2 shows high level similarity to the conserved 
region of about 50 amino acids of various members of an 
acetyltransf erase family (Lin, W., et ai. 1994 ^'Sequence 
5 analysis and molecular characterisation of genes required 
for the biosynthesis of type 1 capsular polysaccharide in 
Staphylococcus aureus", J, Bateriol. 176: 7005-7016) and 
we believe it is the N-acetyltransf erase to convert GDP- 
perosamine to GDP-perNAc. orfl2 has been named wbdR. 

10 The genes manB, manC, gmd, fcl, per and wbdR account 

for all of the expected biosynthetic pathway genes of the 
0157 gene cluster. 

The remaining biosynthetic step{s) required are for 
synthesis of UDP-GalNAc from UDP-Glc. It has been 

15 proposed (Zhang, L., et ai» 1997 ""Molecular and chemical 
characterisation of the lipopoly saccharide 0-antigen and 
its role in the virulence of Yersinia enterocolitica 
serotype OS'.Mol. Microbiol. 23:63-76) that in Yersinia 
enterocolitica UDP-GalNAc is synthesised from UDP-GlcNAc 

20 by a homologue of galactose epimerase (GalE) , for which 

there is a galE like gene in the Yersinia enterocolitica 
08 gene cluster. In the case of 0157 there is no galE 
homologue in the gene cluster and it is not clear how UDP- 
GalNAc is synthesised. It is possible that the galactose 

25 epimerase encoded by the galE gene in the gal operon, can 
carry out conversion of UDP-GlcNAc to UDP-GalNAc in 
addition to conversion of UDP-Glc to UDP-Gal. There do 
not appear to be any gene(s) responsible for UDP-GalNAc 
synthesis in the 0157 gene cluster. 

30 orf4 shows similarity to mamy wzx genes and is named 

wzx and orf2 which shows similarity of secondary structure 
in the predicted protein to other wzy genes and is for 
that reason named wzy. 

The orjfl, orf3 and orfS gene products all have 
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characteristics of transferases, and have been named wbdN, 
wbdO and wbdP respectively. The 0157 O antigen has 4 
sugars and 4 transferases are expected. The first 
transferase to act would put a sugar phosphate onto 
5 undecaprenol phosphate. The two transferases known to 

perform this function, WbaP (RfbP) and WecA (Rfe) transfer 
galactose phosphate and N-acetyl-glucosamine phosphate 
respectively to undecaprenol phosphate. Neither of these 
sugars is present in the 0157 structure. 

10 Further, none of the presxunptive transferases in the 

0157 gene cluster has the transmembrane segments found in 
WecA and WbaP which transfer a sugar phosphate to 
undecaprenol phosphate and expected for any protein which 
transferred a sugar to undecaprenol phosphate which is 

15 embedded within the membrane. 

The WecA gene which transfers GlcNAc-P to 
undecaprenol phosphate is located in the Enterobactereal 
Common Antigen (EGA) gene cluster and it functions in EGA 
synthesis in most and perhaps all E. coli strains, and 

20 also in 0 antigen synthesis for those strains which have 
GlcNAc as the first sugar in the O unit. 

It appears that WecA acts as the transferase for 
addition of GalNAc-l-P to undecaprenol phosphate for the 
Yersinia enterocolitica OS O antigen [Zhang et al.l997 

25 "Molecular and chemical characterisation of the 

lipopolysaccharide O antigen and its role in the virulence 
of Yersinia enterocolitica serotype 08" Mol. Microbiol. 
23: 63-76.] and perhaps does so here as the 0157 structure 
includes GalNAc. WecA has also been reported to add 

30 Glucose-l-P phosphate to undecaprenol phosphate in E. coli 
08 and 09 strains, and an alternative possibility for 
transfer of the first sugar to \indecaprenol phosphate is 
WecA mediated transfer of glucose, as there is a glucose 
residue in the 0157 O antigen. In either case the 

35 requisite number of transferase genes are present if 
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GalNAc or Glc is transferred by WecA and the side chain 
Glc is transferred by a transferase outside of the O 
antigen gene cluster. 

orf9 shows high level similarity (44% identity at 
5 amino acid level, same length) with wcaH gene of the E. 

coli colanic acid capsule gene cluster. The function of 
this gene is unknown, and we give orf9 the name wbdQ. 

The DNA between manB and wdbR has strong sequence 
similarity to one of the H-repeat units of E, coli K12. 

10 Both of the inverted repeat sequences flanking this region 
are still recognisable, each with two of the 11 bases 
being changed. The H-repeat associated protein encoding 
gene located within this region has a 267 base deletion 
and mutations in various positions. It seems that the H- 

15 repeat unit has been associated with this gene cluster for 
a long period of time since it translocated to the gene 
cluster, perhaps playing a role in assembly of the gene 
cluster as has been proposed in other cases, 

20 Materials and Methods - part 4 

To test our hypothesis that O antigen genes for 
transferases and the wzx, wzy genes were more specific 
than pathway genes for diagnostic PGR, we first carried 
out PGR using primers for all the E, coli 016 0 antigen 

25 genes (Table 7). The PGR was then carried out using PGR 
primers for E.coli 0111 transferase, wzx and wzy genes 
(Table 8, 8A) . PGR was also carried out using PGR primers 
for the E. coli 0157 treinsferase, wzx and wzy genes (Table 
9, 9A), 

30 Ghromosomal DNA from the 166 serotypes of E. coli 

available from Statens Serum Institut, 5 Artillerivej , 
2300 Gopenhagen Denmark was isolated using the Promega 
Genomic (Madison WI USA) isolation kit. Note that 164 of 
the serogroups are described by Ewing W. H. : Edwards and 

35 Ewings * Identification of the Enterobacteriacea* Elsevier, 
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Amsterdam 1986 and that they are numbered 1-171 with 
numbers 31, 47, 67, 72, 93, 94 and 122 no longer valid. 
Of the two serogroup 19 strains we used 19ab strain F8188- 
41. Lior H. 1994 [ ^'Classif ication of Escherichia coli In 
5 Escherichia coli in domestic animals and humans pp 31-72. 
Edited by C.L. Gyles CAB international] adds two more 
numbered 172 and 173 to give the 166 serogroups used. 
Pools containing 5 to 8 samples of DNA per pool were made. 
Pool numbers 1 to 19 (Table 4) were used in the E. coli 

10 0111 and 0157 assay. Pool numbers 20 to 28 were also used 
in the 0111 assay, cuid pool numbers 22 to 24 contained E. 
coli 0111 DNA and were used as positive controls (Table 
5) . Pool numbers 29 to 42 were also used in the 0157 
assay, and pool numbers 31 to 36 contained E. coli 0157 

15 DNA, and were used as positive controls (Table 6) . Pool 
numbers 2 to 20, 30, 43 and 44 were used in the E, coli 
016 assay (Tables 4 to 6) . Pool number 44 contained DNA 
of E, coli K-12 strains C600 and WGl and was used as a 
positive control as between them they have all of the E. 

20 coli K-12 016 0 antigen genes. 

PGR reactions were carried out under the following 
conditions: denaturing 94°C/30'; annealing, temperature 
varies (refer to Tables) /30'; extension, 72^C/1'; 30 
cycles. PGR reaction was carried out in an volume of 25mL 

25 for each pool. After the PGR reaction, lOmL PGR product 
from each pool was run on an agarose gel to check for 
amplified DNA. 

Each £. coli chromosomal DNA sample was checked by 
gel electrophoresis for the presence of chromosomal DNA 

30 and by PGR amplification of the E. coli mdh gene using 

oligonucleotides based on E. coli K-12 [Boyd et al. (1994) 
"Molecular genetic basis of allelic polymorphism in malate 
degydrogenase (mdh) in natural populations of Escherichia 
coli and Salmonella enterica " Proc. Nat. Acad. Sci. USA. 

35 91:1280-1284.] Gliromosomal DNA samples from other 
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bacteria were only checked by gel electrophoresis of 
chromosomal DNA. 

A. Primers based on £. coll 016 O antigen gene cluster 
5 sequence . 

The 0 antigen gene cluster of E, coli 016 was the 
only typical E, coli 0 antigen gene cluster that had been 
fully sequenced prior to that of 0111, and we chose it for 
testing our hypothesis. One pair of primers for each gene 
10 was tested against pools 2 to 20, 30 and 43 of E. coli 

chromosomal DNA. The primers, annealing temperatures and 
fiinctional information for each gene are listed in Table 
8. 

For the five pathway genes, there were 17/21, 13/21, 
15 0/21, 0/21, 0/21 positive pools for rmlB, rmlD, rmlA, rmlC 
and glf respectively (Table 7) . For the wzx, wzy and 
three transferase genes there were no positives amongst 
the 21 pools of E. coli chromosomal DNA tested (Table 7) . 
In each case the #44 pool gave a positive result. 

20 

B. Primers based on the E. coli 0111 O antigen gene 
cluster sequence. 

One to four pairs of primers for each of the 
transferase, wzx and wzy genes of 0111 were tested against 

25 the pools 1 to 21 of E. coli chromosomal DNA (Table 8) . 
For wbdH, four pairs of primers, which bind to various 
regions of this gene, were tested and found to be specific 
for 0111 as there was no amplified DNA of the correct size 
in any of those 21 pools of E. coli chromosomal DNA 

30 tested. Three pairs of primers for wbdM were tested, and 
they are all specific although primers #985/#986 produced 
a band of the wrong size from one pool. Three pairs of 
primers for wzx were tested and they all were specific. 
Two pairs of primers were tested for wzy, both are 
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specific although #980/#983 gave a band of the wrong size 
in all pools. One pair of primers for wbdL was tested and 
found unspecific and therefore no further test was carried 
out. Thus, wzx, wzy and two of the three transferase 
5 genes are highly specific to 0111, Bands of the wrong 
size found in amplified DNA are assumed to be due to 
chance hybridisation of genes widely present in E. coli. 
The primers, annealing temperatures and positions for each 
gene are in Table 8. 

10 The 0111 assay was also performed using pools 

including DNA from 0 antigen expressing Yersinia 
pseudotuberculosis , Shigella boydii and Salmonella 
enterica strains (Table 8A) . None of the oligonucleotides 
derived from wbdH, wzx, wzy or wbdM gave amplified DNA of 

15 the correct size with these pools. Notably, pool number 
25 includes S. enterica Adelaide which has the same O 
antigen as E, coli 0111: this pool did not give a positive 
PGR result for any primers tested indicating that these 
genes are highly specific for E, coli 0111. 

20 Each of the 12 pairs binding to wbdH, wzx, wzy and 

wbdM produces a band of predicted size with the pools 
containing 0111 DNA (pools number 22 to 24) . As pools 22 
to 24 included DNA from all strains present in pool 21 
plus 0111 strain DNA (Table 5) , we conclude that the 12 

25 pairs of primers all give a positive PGR test with each of 
three unrelated 0111 strains but not with any other 
strains tested. Thus these genes are highly specific for 
E. coli 0111. 

30 G. Primers based on the E. coli 0157 0 antigen gene 
cluster sequence. 

Two or three primer pairs for each of the 
transferase, wzx and wzy genes of 0157 were tested against 
E. coli chromosomal DNA of pools 1 to 19, 29 and 30 (Table 
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9). For whdN, three pairs of primers, which bind to 
various regions of this gene, were tested and found to be 
specific for 0157 as there was no amplified DNA in any of 
those 21 pools of E. coli chromosomal DNA tested. Three 
5 pairs of primers for wbdO were tested, and they are all 
specific although primers # 1211/#1212 produced two or 
three bands of the wrong size from all pools. Three pairs 
of primers were tested for wbdP and they all were 
specific. Two pairs of primers were tested for wbdR and 

10 they were all specific. For wzy, three pairs of primers 
were tested and all were specific although primer pair 
#1203/#1204 produced one or three bands of the wrong size 
in each pool . For wzx, two pairs of primers were tested 
and both were specific although primer pair #1217/#1218 

15 produced 2 bands of wrong size in 2 pools, and 1 band of 
wrong size in 7 pools. Bands of the wrong size found in 
amplified DNA are assumed to be due to chance 
hybridisation of genes widely present in E. coli. The 
primers, annealing temperatures and function information 

20 for each gene are in Table 9. 

The 0157 assay was also perfoanned using pools 37 to 
42, including DNA from 0 antigen expressing Yersinia 
pseudotuberculosis. Shigella boydii, Yersinia 
enterocolitica 09, Brucella abortus and Salmonella 

25 enterica strains (Table 9A) . None of the oligonucleotides 
derived from wbdN, wzy, wbdO, wzx, wbdP or wbdR reacted 
specifically with these pools, except that primer pair 
#1203/#1204 produced two bands with Y. enterocolitica 09 
and one of the bands is of the same size with that from 

30 the positive control. Primer pair #1203/#1204 binds to 
wzy. The predicted secondary structures of Wzy proteins 
are generally similar, although there is very low 
similarity at amino acid or DNA level among the sequenced 
wzy genes. Thus, it is possible that Y. enterocolitica 09 
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has a wzy gene closely related to that of E. coli 0157. 
It is also possible that this band is due to chance 
hybridization of another gene, as the other two wzy primer 
pairs (#1205/#1206 and #1207/#1208) did not produce any 
5 band with Y. enterocolitica 09, Notably, pool number 37 
includes S. enterica Landau which has the same O antigen 
as E. coli 0157, and pool 38 and 39 contain DNA of B. 
abortus and Y. enterocolitica 09 which cross react 
serologically with E. coli 0157. This result indicates 
10 that these genes are highly 0157 specific, although one 

primer pair may have cross reacted with Y, enterocolitica 
09. 

Each of the 16 pairs binding to wbdN, wzx, wzy, whdO, 
v/bdP and wbdR produces a band of predicted size with the 

15 pools containing 0157 DNA (pools number 31 to 36). As 

pool 29 included DNA from all strains present in pools 31 
to 36 other them 0157 strain DNA (Table 6), we conclude 
that the 16 pairs of primers all give a positive PGR test 
with each of the five unrelated 0157 strains. 

20 Thus PGR using primers based on genes wbdW, wzy, 

whdO, wzx, wbdP and wbdR is highly specific for E. coli 
0157, giving positive results with each of six unrelated 
0157 strains while only one primer pair gave a band of the 
expected size with one of three strains with 0 antigens 

25 known to cross-react serologically with E. coli 0157. 
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TABLE 1 

H7 strains used in this work in addition to the typing strains 



Name used in 


Serotype 


Original name 


Source* 


this study 








M527 


0157:H7 


C664-1992 


a 


M917 


018ac:H7 


A57 


IMVS 


M918 


018ac:H7 


A62 


IMVS 


M973 


02:H7 


A1107 


CDC 


M1004 


0157:H7 


EH7 


b 


Ml 179 


018ac:H7 


D-M3291/54 


MVS 


M1200 


07:H7 


A64 


c 


M1211 


019ab:H7 


F8I88-41 


IMVS 


Ml 328 


053:H7 


14097 


IMVS 



* 

a. Statens Serum Institut, Copenhagen, Denmark. 

b. Dr. R. Brown of Royal Children's Hospital, Melbourne, Australia. 

c. Max-Planck Institut fur molejulare Genetik, Berlin, Germany. 

d. Dr. P. Tarr of Children's Hospital and Medical Center, University of 
Washington, USA 

IMVS Institute of Medical and Veterinary Science, Adelaide, Australia. 

CDC Centers for Disease Control and Prevention, Atlanta, USA 
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TABLE 2 

Oligonucleotides used to PCR amplify fliC genes 
from different H tvoino strains 


H TvDe 




xrx xinexs usea 


1 




trXD fD/ WIO / 0 


2 


55 


#1285/#1286 


3 


55 


#1285/#1286 


4 


50 


$1431/#1432 


5 


60 


#1285/#1286 


6 


55 


#1575/#1576 


7 


55 


#1575/#1576 


8 


55 


#1431/#1432 


9 


60 


#1575/#1576 


10 


55 


#1575/#1576 


11 


55 


#1285/#1286 


12 


60 


#1575/#1576 


14 


60 


#1 S75/#1576 

JT ^ -J * ■J t tT ^ ^ 1 u 


15 


60 


#1575/#1576 


16 


60 


#1575/#1576 


17 


60 


#1417 /itl418 


18 


60 




19 


60 




20 


60 




21 


55 


«i 7ft«; /ii oftfi 


23 


60 


#1 *^7'; /#1 S7fi 


24 


60 




25 


60 


ill A1 7 /ftl 41 ft 


26 


60 




27 


50 




28 


60 




29 


60 


#1285/#12a6 


30 


60 


#1575/#1576 


31 


60 


#1575/#1576 


32 


60 


#1575/#1576 


33 


60 


#1285/1286 


34 


55 


#1575/#1576 


35 


50 


#1431/#1432 


37 


60 


#1285/#1286 


38 


60 


#1285/#1286 


39 


55 


#1285/#1286 


40 


55 


#1285/#1286 


41 


60 


#1575/#1576 


42 


60 


#1285/#1286 


43 


60 


#1575/#1576 


44 


60 


#1285/#1286 


45 


60 


#1575/#1576 


46 


60 


#1575/#1576 


47 


55 


#1285/#1286 


48 


60 


#1575/#1576 


49 


60 


#1575/#1576 


50 


60 


#1285/#1286 


51 


60 


#1575/#1576 


52 


60 


#1575/#1576 


54 


50 


#1431/#1432 


55 


60 


#1285/#1286 


56 


60 


#1285/#1286 
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TABLE 3 Specific H type oligonucleotide primers 



H 

type 


RagelUn gene sequence used 
for primer choice 


Positions of primer 1 


Positions of 
primer 2 


Other flaggelin 
gene(s) highly 
similar to this 


Loci other than 
fliC encoding 
the H antigen 


1 
1 


iNo (ODtainea irom uenoanK) 










2 


yes 


568-587 


1039-1056 






'i 
J 


yes 


649-666 


925-942 




ilKA 


4 




466-483 


628-648 


44,17 




5 


yes 


697-714 


877-897 






6 


yes 


565-585 


799-816 






7 


yes 


553-570 (primer 
#1806) 


1483-1500 
(pnmer#1809) 






8 


yes 


562-579 


1045-1062 


40 




9 


yes 


616-633 


838-855 






10 


yes 


559-579 


697-717 


50 




11 


yes 


586-606* 


790-810* 






12 


yes 


745-765 


1024-1041 


1 




14 


yes 


586-606 


793-813 






15 


yes 


640-660 


817-834 


16 




16 


No 






15 




17 


No 






4 and 44 




18 


yes 


589-606 


802-819 






19 


yes 


607-624 


838-855 






20 


yes 


574-591 


760-780 






21 


yes 


676-693** 


862-879** 


47 




23 


yes 


637-654 


1336-1353 






24 


yes 


496-516 


772-792 






25 


yes 


529-549 


703-723 






26 


yes 


553-570 


772-789 






27 


yes 


685-702 


799-819 






28 


yes 


592-609 


778-798 






29 


yes 


538-555 


757-774 






30 


yes 


814-831 


943-962 






31 


yes 


571-588 


790-807 






32 


yes 


814-831 


1057-1074 






33 


yes 


553-570 


718-735 






34 


yes 


568-585 


796-816 






35 


no (non-functional gene) 


769-789* 


1045-1065* 






36 


No (PCR generated two bands) 








flkA 


37 


yes 


520-537 


715-735 






38 


yes 


553-573 


709-729 


55 




39 


yes 


556-573 


718-735 






40 


No 






8 




41 


yes 


598-615 


784-801 






42 


yes 


547-567 


715-735 






43 


yes 


580-597 


844-861 






44 


No 






4 and 17 


fllA 


45 


yes 


640-657 


943-963 






4o 


yes 


565-582 


781-801 






47 


No 






21 


fllcA 


48 


yes 


568-585 


835-852 






49 


yes 


589-609 


754-771 






50 


No 






10 




51 


yes 


565-582 


1042-1059 






52 


yes 


598-615 


829-846 






53 


No (PCR generated two bands) 








flkA 


54 


No (non-fiinctional gene) 


988-1008** 


1344-1364** 




flmA 


55 


No 






38 


fllA 


56 


yes 


697-714 


877-897 







Sec text for choice of primers for fllC gene of H 1 1 



Sec text for choice of primers for fllC gene of H2 1 
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TABLE4 



Pool 
No. 


Strains of which chromosonal DNA included in the pool 


Source 


1 


£. coli type strains for O serotypes 1, 2, 3, 4, 10, 16, 18 and 39 


IMVS» 


2 


£. coli type strains for O serotypes 40, 41, 48, 49, 71, 73, 88 and 100 


IMVS 


3 


E. coli type strains for O serotypes 102, 109, 119, 120, 121, 125, 126 and 
137 


IMVS 


4 


£. coli type strains for O serotypes 138, 139, 149, 7, 5, 6, 11 and 12 


IMVS 


5 


E. coli type strains for 0 serotypes 13, 14, 15, 17, 19ab, 20, 21 and 22 


IMVS 


6 


£. coli type strains for O serotypes 23, 24, 25, 26, 27, 28, 29 and 30 


IMVS 


7 


£. coli type strains for 0 serotypes 32, 33, 34, 35, 36, 37, 38 and 42 


IMVS 


ft 

o 


F mil tvnp <;tTain^ fnr O <w»rotvr>es 43 44 45- 46, 50* 51- 52 and 53 


IMVS 


9 


E. coli type strains for O serotypes 54, 55, 56, 57, 58, 59, 60 and 61 


IMVS 


10 


E. coli type strains for O serotypes 62, 63, 64, 65, 66, 68, 69 and 70 


IMVS 


11 


E. coli type strains for O serotypes 74, 75, 76, 77, 78, 79, 80 and 81 


IMVS 


12 


E. coli type strains for O serotypes 82, 83, 84, 85, 86, 87, 89 and 90 


IMVS 


13 


£. colt type strains tor U serotypes 91, vz, yi>, Vo, v/, Vo, ana iUi 


IMVD 


14 


E. coli type strains for O serotypes 103, 104, 105, 106, 107, 108 and 110 


IMVS 


15 


E. coli type strains for O serotypes 112, 162, 113, 114, 115, 116, 117 and 
118 


IMVS 


16 


E. coli type strains for O serotypes 123, 165, 166, 167, 168, 169, 170 and 
171 


Seeb 


17 


E. coli type strains for O serotypes 172, 173, 127, 128, 129, 130, 131 and 
132 


Seec 


18 


E. coli type strains for O serotypes 133, 134, 135, 136, 140, 141, 142 and 
143 


IMVS 


19 


E. coli type strains for O serotypes 144, 145, 146, 147, 148, 150, 151 and 
152 


IMVS 



a. Institute of Medical and Veterinary Science, Adelaide, Australia 

b. 123 from IMVS; the rest from Statens Serum Institut, Copenhagen, Denmark 

c. 172 and 173 from Statens Serum Institut, Copenhagerv Derunark, the rest from 
IMVS 
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TABLE5 



Pool 
No. 


Strains of which chromosonal DNA included in the pool 


Soxirce* 


20 


E. coli type strains for O serotypes 153, 154, 155, 156, 157, 158 , 159 and 
160 


IMVS 


21 


E. coli type strains for O serotypes 161, 163, 164, 8, 9 and 124 


IMVS 


22 


As pool #21, plvis E. coli 0111 type strain Stoke W. 


IMVS 


23 


As pool #21, plus E. coli 0111:H2 strain C1250-1991 


Seed 


24 


As pool #21, plus E. coli 0111:H12 strain C156-1989 


Seee 


25 


As pool #21, plus S. enlerica serovar Adelaide 


Seef 


26 


y. pseudotuberculosis strains of O groups lA, UA, UB, UC, HI, IVA, IVB, 
VA,VB,VIandVn 


Seeg 


27 


S. boydii strains of serogroups 1, 3, 4, 5, 6, 8, 9, 10, 11, 12, 14 and 15 


Seeh 


28 


S. enterica strains of serovars (each representing a different O group) 
Typhi, Montevideo, Ferruch, Jangwani, Raus, Hvittingfoss, Waycross, 
Dan, Dugbe, Basel, 65,:i:e,n,z,15 and 52:d:e,n,x,zl5 


IMVS 



d. C1250-1991 from Statens Serum Institut, Copenhagen, Denmark 

e. C156-1989 from Statens Serum Institut, Copenhagen, Denmark 

f. S. enterica serovar Adelaide from IMVS 

g. Dr S Aleksic of Institute of Hygiene, Germany 

h. Dr J Lefebvre of Bacterial Identification Section, Laboratoroie de Sante Publique 
du Quebec, Canada 
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TABLE 6 



Pool 

Mr* 
INO. 


Strains of which chromosonal DNA included in the pool 


Source* 




t. coil type strams ror U serotyp>es \o5, IM, Iod, loO/ ioc>, id? ana lou 


IMVb 


30 


h. coil type strams tor O serotypes 161, 163, 164, o, 9, 111 ana 124 


IMVb 


31 


As pool #29, plus £. coll 0157 type strain A2 (0157:H19) 


IMVS 


32 


As pool #29, plus £. coll 0157:H16 strain C475-89 


See d 


33 


As pool #29, plus £. coll 0157;H45 strain C727'%9 


Seed 






Seed 


35 


As Dool #29 olus E co/i 0157*H39 strain C258-94 


Seed 


36 


As pool #29, plus E, co/i Ol57:H26 


Seee 


37 


As pool #29, plus S. enterica serovar Landau 


Seef 


38 


As pool #29, plus Brucella abortus 


Seeg 
Seeh 


39 


As pool #29, plus Y. enterocolitica 09 




40 


Y. pseudotuberculosis strains of O groups lA, HA, IIB, nC, m, IVA, IVB, VA, 
VB,VIand Vn 


Seei 


41 


S. boydii strains of serogroups 1, 3, 4, 5, 6, 8, 9, 10, 11, 12, 14 and 15 


Seej 


42 


S. enterica strains of serovars (each representing a different 0 group) Typhi, 
Montevideo, Ferruch, Jangwani, Raus, Hvittingfoss, Waycross, Dart, Dugbe, 
Basel, 65:i:e,n,zl5 and 52:d:e,n,x,zl5 


IMVS 


43 


E. coli type strains for O serotypes 1,2^4,10,18 and 29 


IMVS 


44 


As pool #43, plus E. coli K-12 strains C600 and WGl 


IVMS 
Seek 



d. 0157 strains from Statens Serum Institut, Copenhagen, Denmark 

e. 0157:H26 from Dr R Brown of Royal Children's Hospital, Melbourne, Victoria 

f . S. enterica serovar Landau from Dr M Poppoff of Institut Pasteur, Paris, France 

g. B. Abortus from the culture collection of The University of Sydney, Sydney, Australia 

h. Y. enterocolitica 09 from Dr. K. Bettelheim of Victorian Infectious Diseases Reference 
Laboratory Victoria, Australia. 

i. I>r S Aleksic of Institute of Hygiene, Germany 

J. Dr J Lefebvre of Bacterial Identification Section, Laboratoroie de Sante Publique du 
Qufebec, Canada 

k. Strains C600 and WGl from Dr. B J. Backmarm of Department of Biology, Yale 
University, USA. 
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CLMMS: 

1. A nucleic acid molecule encoding all or part of an E, 
coli flagellin protein. 

5 

2 . A method of detecting the presence of E. coli of a 
particular H serotype in a sample, the method comprising 
the step of specifically hybridising at least one nucleic 
acid molecule derived from a flagellin gene, wherein the 

10 at least one nucleic acid molecule is specific for a 

particular flagellin gene associated with the H serotype, 
to any B, coli in the sample which contain the gene, and 
detecting any specifically hybridised nucleic acid 
molecules, wherein the presence of specifically hybridised 

15 nucleic acid molecules identifies the presence of the H 
serotype in the sample, 

3 . A method of detecting the presence of E. coli of a 
particular H serotype in a sample, the method comprising 

20 the step of specifically hybridising at least one pair of 
nucleic acid molecules to any E, coli in the sample which 
contains the flagellin gene for the particular H serotype, 
wherein at least one of the nucleic acid molecules is 
specific for the particular flagellin gene associated with 

25 the H serotype, and detecting any specifically hybridised 
nucleic acid molecules, wherein the presence of 
specifically hybridised nucleic acid molecules identifies 
the presence of the H serotype in the sait^le. 

30 4. A method for detecting the presence of a particular O 
serotype and H serotype of E. coli in a sample, the method 
comprising the following steps: 

(a) specifically hybridising at least one nucleic 
acid molecule, derived from and specific for a gene 
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encoding a transferase or a gene encoding an enzyme for 
the transport or processing of a polysaccharide or 
oligosaccharide unit, the gene being involved in the 
synthesis of a particular E. coli O antigen, to any E. 
5 coli in the sample which contain the gene; 

(b) specifically hybridising at least one nucleic 
acid molecule derived from and specific for a particular 
flagellin gene associated with that H serotype, to any E. 
coli in the sample which contain the gene; and 
10 (c) detecting any specifically hybridised nucleic 

acid molecules. 

5. A method for detecting the presence of a particular 0 
serotype and H serotype of E. coli in a sample, the method 
15 comprising the following steps: 

(a) specifically hybridising at least one pair of 
nucleic acid molecules , derived from and specific for a 
gene encoding a transferase or a gene encoding an enzyme 
for the transport or processing of a polysaccharide or 

20 oligosaccharide unit, the gene being involved in the 

synthesis of a particular K, coli 0 antigen, to any E, 
coli in the sample which contain the gene; 

(b) specifically hybridising at least one pair of 
nucleic acid molecules derived from and specific for a 

25 particular flagellin gene associated with that H serotype, 
to any E. coli in the sample which contain the gene; and 

(c) detecting any specifically hybridised nucleic 
acid molecules, 

30 
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CO 



GATCTGATGGCCGTAGGGCGCTACGTGCTTTCTGCTGATATCTGGGCTGAGTTGGAAAAA 60 

ACTGCTCCAGGTGCCTGGGGACGTATTCAACTGACTGATGCTATTGCAGAGTTGGCTAAA 12 0 

aaacagtctgttgatgccatgctgatgaccggcgacagctacgactgcggtaagaagatg 180 

ggctatatgcaggcattcgttaagtatgggcixx:gcaaccttaaagaaggggcgaagttc 240 

cgtaagagcatcaag aagctactgagtgagtag agatttacacgtctttgtgacgataag 300 

CCAGAAAAAATAGCGGCAGTTAACATCCAGGCTTCTATGCTTTAAGCAATGGAATGTTAC 3 60 

TGCCGTTTTTTATGAAAAATGACCAATAATAACAAGTTAACCTACCAAGTTTAATCTGCT 420 

TTTTGTTGGATTTTTTCTTGTTTCTGGTTCCAT^^ 480 

GAGAGTTTTGCGGGATCTCGCGGAACTGCTCACATCTTTGGCATTTAGTTAGTGCA^ 540 

tagctgttaagccaggggcggtagctikk:ctaattaatttttaacgtatacatto 600 

TGCCGCTTATAGCT^TAAAGTCAATCGGATTAAACTTCTTTTCCATTAGGTAAAAGAGT 660 

GTTTGTAGTCGCTCAGGGAAATTGGTTTTGGTAGTAGTACTTTTC AAATTATC 720 



Start of or£l 

MLLCCIHINVYYLL 
CGATTTAGATGGC AGTTGMIGTTACTATGCTGCATACATATCAATGTATATTATTTACTT 780 

lecdmkkiviignvasmmlr 

TTAGAATGTGATATGAAAAAAATAGTGATCATAGGCAATGTAGCGTCAATGATGTTAAGG 840 

FRKELIMNLVRQGDNVYCLA 
TTCAGGAAAGAATTAATCATGAATTTAGTGAGGCAAGGTGATAATCTATATTGTCT^ 900 

NDFSTEDLKVLS SWGVKGVK 
AATG ATTTTTCCACTGAAGATCTTAAAGTACTTTCGTCATGGGGCGTTAAGGGGG^ 960 

FSLNSKGINPFKDI lAVYEL 
TTCTCTCTTAACTCAAAGGGTATTAATCCITTTAAGGATATAATTGCTC 1020 

KKILKDISPDIVFSYFVKPV 
AAAAAAATTCTTAAGGATATTTCCCC AGATATTGTATTTTCATATTTTGTA^ 1080 

IFGTIASKLSKVPRIVGMIE 
ATATTTGGAACTATTGCTTC AAAGTTGTCAAAAGTGCCAAGGAI^ 1140 

GLGNAFTYYKGKQTTKTKMI 
GGTCTAGGTAATGCCTTGACTTATTATAAGGGAAAGCAGACCACAA^^ 1200 

KWIQILLYKLALPMLDDLIL 
AAGTGGATAC AAATTCTTTTAT ATAAGTTAGCATT ACCGATGCTTGATGAT^^ 1260 

LNHDDK KDLIDQYNIKAKVT 
TTAAATCATGATGATAAAAAAGATTTAATCGATCAGTATAATATTAAAGCTAAGGTAACA 1320 

VLGGIGLDLNEFSYKEP pke 
GTGTTAGGTGGGATTGGATTGGATCTTAATGAGTTTTCATATAAAGAGCC^ 1380 

KITFIFIARLLREKGIFEFI 
AAAATTACCTTTATTTTTATAGCAAGGTTATTAAGAGAGAAAGGGATATTTGAGTT^ 1440 

eaakfvkttypssefvilgg 

GAAGCCGCAAAGTTCGTTAAGACAACTTATCCAAGTTCTGAATTTGTAATTTTAG^ 1500 
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FESNNPFSLQKNEIESLRKE 
TTTGAGAGTAATAATCCTTTCTC ATTAC AAAAAAATGAAATTGAATCGCTAAGAAM 1560 

HDLIYPGHVENVQDWLEKS S 
CATGATCTTATTTATCCTGGTCATGTGGAAAATGTTCAAGATTGGTTAGAGAAAAGTTC 1620 

VFVLPTSYREGVPRVIQEAM 
GTTTTTGTTTTACCTACATCATATCGAGAAGGCGTACCAAGGGTGATCCAAGAAGCTATC 1680 

AIGRPVITTNVPGCRDI IND 
GCTATTGGTAGACCTGTAATAACAACTAATGTACCTGGGTGTAGGGATATAATAAATGAT 1740 

GVNGFLIPPFEINLLAEKMK 
GGGGTCAATGGCTTTTTGATACCTCCATTTGAAATTAATTTACTGGCAGA;^^ 1800 

yfienkdkvl'emglagrkfa 

TATTTTATTGAGAATAAAGATAAAGTACTCGAAATGGGGCTTGCTGGAAGGAAGT^ 1860 

eknfdafeknnrlasi iksn 

GAAAAAAACTTTGATGCTTTTGAAAAAAATAATAGACTAGC ATO^TAATAAAATC^^ 1920 



End o£ or£X 

N D F * 

AATGATTTT IYSACTTGAGCAGAAATTATTTATATTTCAATCTGAAAAATAAAGGCTG 1980 
start o£ or£2 

MNKVALITGITGQDGSYLA 
TT ATGA ATAAAGTGGCATTAATTACTGGTATCACTGGGCAAGATGGCTCCTATT^ 2040 

elllekgyevhgikrrassf 

AATTATTGTTAGAAAAAGGTTATGAAGTTCATGGTATTAAACGCCGTGCATCTTCATTT^ 2100 

ntervdhiyqdshlanpklf 

ATACTGAGCGAGTGGATCACATCTATCAGGATTCACATTTAGCTAATCCTAAACTTTT^ 2160 

LHYGDLTDTSNLTRILKEVQ 
TACACTATGGCGATTTGACAGATACTTCCAATCTGACCCGTATTTTAAAA^ 2220 

PDEVYNLGAMSHVAVSFESP 
CAGATGAAGTTTACAATTTGGGGGCGATGAGCCATGTAGCGGTATCATT^ 2280 

EYTADVDAIGTLRLLEAIRI 
AATACACTGCTGATGTTGATGCGATAGGAACATTGCGTCTTC^ 2340 

LGLEKKTKFYQASTSELYGL 
TGGGGCTIKKSAAAAAAAGACAAAATTTTATCAGGCTTCAACTT^ 2400 

vqeipqkettpfyprspyav 

TTCAAGAAATTCCACAAAAAGAGACTACGCCATTTTATCCACGT^ 2460 

AKLYAYWITVNYRESYGMFA 
CAAAATTATATGCCTATTGGATCACTGTTAATTATCGTGAGT^ 2 52 0 

CNGILFNHESPRRGETFVTR 
GCAATGGTATTCTCTTTAACCAOSAATCACCTCGCCGTGGCGAGACCTT^ 2580 

KITRGIANIAQGLDKCLYLG 
AAATAACACGCGGGATAGC AAATATTGCTCAAGGTCTTGATAAATGCTTATA 2640 

NMDSLRDWGHAKDYVKMQWM 
ATATGGATTCTCTGCGTGATTGGGGACATGCTAAGGATTATCTCAAAATGCAAT^ 2700 
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MLQQETPEDFVIATGIQYSV 
TGCTGCAGCAAGAAACTCCAGAAGATTTTGTAATTGCTACAGGAATTCAATATTCTGTCC 2760 

REFVTMAAEQVGIELAFEGE 
GTGAGTTTGTCACAATGGCGGCAGAGCAAGTAGGC ATAGAGTTAGC ATTTGAAGGTG AGG 2820 

GVNEKGVVVSVNGTDAKAVN 
GAGTAAATGAAAAAGGTGTTGTTGTTTCGGTCAATGGC ACTGATGCTAAAGCTGTAAACC 2880 

PGDVI ISVDPRYFRPAEVET 
CGGGCGATGTAATTATATCTGTAGATCCAAGGTATTTTAGGCCTGCAGAAGTTGAAACCT 2940 

LLGDPTNAHKKLGWSPEITL 
TGCTTGGCGATCCTACTAATGCGCATAAAAAATTAGGATGGAGCCCTGAAATTACATTGC 3000 

REMVKEMVSSDLAIAKKNVL 
GTGAAATGGTAAAAGAAATG GTTTCCAGGOATTTAGGAATAGCGAAAAAGAAGOTGTTGG 3060 

End of orf 2 

LKANNIATNIPQE* 

TGAAAGCTAATAAGATTGCGACTAATATTCCGCAAGAA rAAAAAAGATAATAGAOTAAAT 3120 

Start o£ or £3 

M F 

AATTAAAAATGGTGCTAGATOTATTAGTACGATTATTTTTTTTTGGGTOAGTAA^ 3180 

ITSDKFREI IKLVPLVSIDL 
TTACATCAGATAAAOTOAGAGAAATTATCAAGTTAG'rTCGATTAG'rA'rGAA'rTOATG'rCC 3240 

LIENENGEYLFGIiRNNRPAK 
TAAOTGAAAAGGAGAATGGTGAATATOTATOTGGTCTTAGGAATAATGGAGCQGCGAAAA 3300 

NYFFVPGGRIRKNESIKNAF 
ACTATTTTTTTOTTCCAGGTGGTAGGATTCGGAAAAATGAATGTATTAAAAATCCOT'TTA 3360 

KRISSMELGKEYGISGSVFN 
AAAGAATATGATCTATOGAATTAGGTAAAGAQTATGGTATTTCAGGAAGTGTTTTTAATO 3420 

GVWEHFYDDGFFSEGEATHY ^^^^ 

IVLCYTLKVLKSELNIiPDDQ 
TAGTGGTlTCTTACACAGTGAAAGTTGTTAAAAGTGAATTQAATGTCGGAGATGATGAAe 3540 

HREYLWLTKHQINAKQDVHN 
ATGGTGAATAGCTTTGGGTAAGTAAAGAGCAAATAAATGGTAAAOAAQATQTTGATAACT 3600 

firx '^ of orf 3 Start of orf 4 

YSKNYFL* M 
ATTGAAAAAATOATTTTTTO gaATTTOTAOTAAAAATTAATATGGGAGAGAA'ITCT A gG? 3660 

SQCLYPVI lAGGTGSRLWPL 
CTCAATGTGTTTAGGGTGTAATOATTGCCGGAGGAACGGGAAGGGGTCTATGGGGOTTGT 3720 

SRVLYPKQFLNLVGDSTMLQ 
CTCGAGTAOTATAGGGTAAAGAATTTTTAAATTTAOTTQOOOATTGTAGAATGTTCGAAA 3780 

TTITRLDGIECENPIVICNE 
GAACAATTACGGGTl^GATOOOATGGAATGGGAAAATGGAAOTGOTATGTGCAATOAAG 3840 

DHRFIVAEQLRQIGKLTKNI 
ATCAGGGATOTATTOTAGCAOAOGAATTAGGAGAOATTOGTAAGGTAAGGAAQAATATTA 3900 

ILEPKGRNTAPAIALAAFIA 
TAGTTQAGGGQAAAGGGGGTAATAGTOGAGGTOGGATAOCTTTAQGTGGTTTTATGOGTG 3960 
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QKNNPNDDPLLLVLAADHSI 
AGAAGAATAATCCTAATCACGACCCTTTATOATTAGTAC'TTOGGGCAGACCAC'rCTATAA 4020 

NNEKAFRESIIKAMPYATSG 
ATAATGAAAAAGGATOTCGAQAGTCAATAATAAAAGCTATGCCaTATGCAACTTCTGGGA 4080 

KLVTFGI IPDTANTGYGYIK 
AGTTAGTAACATT'TGGAATTATTOCGGACACGGGAAA'rACTGG'rTATOGATA'rATTAAGA 4140 

RSSSADPNKEFPAYNVAEFV 
GAAGTTCTTCAGCTGATCGTAATAAAGAATTCCGAQGATATAATGTTGCCGAGTTTGTAG 4200 

EKPDVKTAQEYIS SGNYYWN 
AAAAAGCAGATGTTAAAACAGGACAGGAATATATTTCGACTGGGAATTA'PrACTOGAATA 4260 

SGMFLFRASKYLDELRKFRP 
GCGGAATGTTTTTATTTGGGGGGAGTAAATATGTTGATGAAGTACCGAAATTTAGACGAG 4320 

DIYHSCECATATANIDMDFV 
ATATTTATCATAGCTCTGAATGTGGAACCGGTACAQGAAATATAGATATOGAC'PrTG'rCC 4380 

RINEAEFINCPEESIDYAVM 
GAATTAACGAGGCTGAQOTTAOTAATTGTCGTOAAGAGTCTATGGATTATGCTG'rGA'rOO 4440 

EKTKDAVVLPIDIGWNDVGS 
AAAAAAGAAAAGACGQTGTAGTTGTTGCGATAGATATTGGGTGGAATGAGGTGGGTTG'rT 4500 

WSSLWDISQKDCHGNVCHGD 
GGTCATCACTTTGGGATATAAGCGAAAAGGATTGGGATGGTAA'rGTGTGCGATGGGGA'rO 4560 

VLNHDGENSFIYSES SLVAT' 
TCGTGAATGATGATGGAGAAAATAGTTTTATTTACTG'rGAGTGAAGTGTGGTTGCOAGAG 4620 

VGVSNLVIVQTKDAVLVADR 
TCGGAGTAAGTAATOTAGTAATTGTCCAAACGAAGGATGC'rG'rAGTGG'rTGGGOACGGTG 4680 

DKVQNVKNIVDDLKKRKRAE 
ATAAAGTGGAAAATGTTAAAAAGATAGTTGACQATGTAAAAAAGAGAAAAGGTGGTGAAT 4740 

YYMHRAVFRPWGKFDAIDQG 
AGTAGATGCATGGTCGAGTTTTTGGGCCTTGGGGTAAATTCGATGGAATAGACGAAGGGG 4800 

DRYRVKKI IVKPGEGLDLRM 
ATAGATATAGAGTAAAAAAAAT A ATAGTTAAAGGAGGAGAAOGOTTAGATTOAAGGATGG 4860 

HHHRAEHWIVVSGTAKVSLG 
ATCATGATAGGGOAGAGGATTGGATTG'rTCTATGGGGTAGTGGTAAAQTTa?GAGTAGGTA 4920 

SEVKLLVSNESIYIPQGAKY 
GTOAAGOTAAAGTATTAGTTTGTAATGAGTGTATATATATGGGTGAGGGAGGAAAA'TA'rA - 4980 

SLENPGVIPLHLIEVS SGDY 
GTCITOAGAATCCAGGGGTAATAGGTTTGGATGTAATTGAAOTAAGTTGTOOTOATTA 5040 

LESDDIVRFTDRYNSKQFLK 
TTGAATGAQATGATATAGTGGQTT'FrAGTOAGAGATATAACAGTAAAGAATTCGTAAAGG 5100 



End o£ or£4 Start o£ or£5 

MN KITCFKAYDIRGRL 

R D * 

OAQATTOATAAATM i gAATAAAATAAGTTGCTTGAAAGGATATOATATAGQTOQQGQTGT 5160 
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GAELNDEIAYRIGRAYGEF F 
TCGTGGTGAATTGAATCATCAAATAGCATATAGAATTGGTCGCGCTTATGGTGAGCT^ 5220 

KPQTVVVGGDARLTSESLKK 
TAAACCTCAAACTGTAGTTGTGGGAGGAGATCGTCGCCTAACAAG'rGAGAG'rTTAAAGAA - 5280 

SLSNGLCDAGVNVLDLGMCG 
ATCACTGTGAAATGGGC'TATGTGATGCAGGCGTAAATG'PCTTAGATCTTGGAATGTGTGG 5340 

TEEIYFSTWYLGIDGGIEVT 
TACTGAAGAGATATATTTOTCCACTTGGTATTTAGGAATTGATGGTGGAATCGAGGTAAC 5400 

ASHNPIDYNGMKLVTKGARP 
TCCAAGCGATAATCCAATTGATTATAATGGAATGAAATTAGTAACCAAAGGTGCTCGACC 5460 

ISSDTGLKDIQQLVESNNFE 
AATCAGGAGTOACACAGGTCTCAAAGATATAGAAGAAOTAGTAGAGAGfPAATAATTTTGA 5520 

ELNLEKKGNITKYSTRDAYI 
AGAGCTGAACCTAGAAAAAAAAGGGAATAOTACCAAATATTCGACCCGAGATGCCTAGAT 5580 

NHLMGYANLQKIK KIKIVVN 
AAATGATTTGATGGGOTATGCTAATGTGGAAAAAATAAAAAAAATOAAAATAGTTGTGAA 5640 

SGNGAAGPVIDAIEECFLRN 
OTCTGGGAATQGTGGAGCTGGTGCTGTTATTGATGGTATTGAGGAATGGTTTOTAG 5700 

NIPIQFVKINNTPDGNFPHG 
GAATACTCCGATTGAGTOTCTAAAAATAAATAATAGACGGGATGGTAA'rTOTCCAGATGG 5760 

IPNPLLPECREDTSSAVIRH 
TATCCCTAATGGATTACTACCTGAGTGGAGAGAAQATAGGAGGAGTGCGGTTATAAGAGA 5820 

SADFGIAFDGDFDRCF FFDE 
TAGTGCTGATTTTGGTAaTOCATTTGATGGTGATTTTGATAGGTGTTTTTTCTTTC^ 5880 

NGQFIEGYYIVGLLAEVFLG 
AAATGGACAATTTATTOAAGGATAGTAGATTCaTCGTTTATTAGCGGAAGTTTT^^ 5940 

KYPNAKI IHDPRLIWNTIDI 
GAAATATGGAAAGGGAAAAATGATTGATGATGGTGGGCTTA'rA'rGGAA'rACTATTGA'rA'P 6000 

VESHGGIPIMTKTGHAYIKQ 
GGTAGAAAGTGATCGTGGTATACG'rATAATGAGTAAAAGGGGTGATOQ'rTAGAa''rAAGGA 6060 

RMREEDAVYGGEMSAHHYFK 
AAGAATKKJGTQAAGAGGATOGGGTATAaKKSGGGGGAAATGAGTGGGGATCATTATTTTAft - 6120 

DFAYCDSGMIPWILICELLS 
AGATTTTCGATAGTGGQATAGTOGAATCAOTCGTTGGATTTOAATTTO 6180 

LTNKKLGELVCGCINDWPAS 
TCTGACAAATAAAAAATTAQGTQAAGTGGTOTGTGGTTGTATAAAGGAGTGGGGGGG^ 6240 

GEINCTLDNPQNEIDKLFNR 
OKKaAGAAATAAAGTGTAGAGTAGAGAATGGGGAAAATGAAATAGATAAATTATTTAATGQ 6300 

YKDSALAVDYTDGLTMEFSD 
TTAGAAAGATAGTOCGTTAGGTGTTQAOTAGAGTGATGGATTAACTATGGAGTTGTG'TGA 6360 

WRFNVRCSNTEPVVRLNVES 
TTGGGGTTTOAATGTTAGATGGTGAAATAGAGAAGGTOTAG'PAGGAOTGAA'rGTAGAATC 6420 

RNNAILMQEKTEEILNFISK 
TAGQAATAATQOTATTGTTATGGAGQAAAAAAGAGAAQAAATTGTQAATTTTATA 6480 
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End o£ or£5 Start o£ or£6 

* MKVLLTG 
ATAAATTTGCACCTGAGCTCATAATCGGAAGAAGAAATATM;GAjyVGTACTTCTGACT^^ 6540 

STGMVGKNILEHDSASKYNI 
GTCAACTGGGATGGTTGGTAAGAATATA'PrAGAGCATGATAGTGGAAGTAAATATAATAT 6600 

LTPTSSDLNLLDKNEIEKFM 
ACTTACTCCAACCAGCTCTGATTTGAATTTATTAGATAAAAATGAAATAGAAAAATTCAT 6660 

LINMPDCI IHAAGLVGG IHA 
GCTTATCAAGATCGCAGAGTGTATTATAGATGGAGCGGGATTAGTTGGAGGCATTCATGG ' 6720 

NISRPFDFLEKNLQMGLNLV 
AAATATAAGCAGGG C GTTTGATTTTCTGGAAAAAAATTTGCAGATGGGTTTAAA'TTTAG'r - 6780 

SVAKKLGIKKVLNLGSSCMY 
TTCCGTCGCAAAAAAACTAGGTATCAAGAAAGTGCTTAAGTTGGGTAGTTCATGCATGTA 6840 

PKNFEEAIPEKALLTGELEE 
CCCCAAAAACOTTGAAGAGGCTAOTCCTOAGAAAGGTCTGTTAACTGGTGAGGTAGAAGA 6900 

TNEGYAIAKIAVAKACEYIS 
AACTAATGAGGGATATGCTATroCGAAAATTGCTGTAGCAAAAGGATGCQAATATATATC 6960 

RENSNYFYKTI IPCNLYGKY 
AAGAGAAAACTCTAATTATTTTTATAAAAGAAOTATCGCATGTAATOTA'rATOGGAAATA 7020 

DKFDDNSSHMIPAVIKKIHH 
TGATAAATOaXBATGA'TAACTCGTCACA'rATGA'r'rCCGGCAGTTA'rAAAAAAAATCGATGA 7080 

AKINNVPEIEIWGDGNSRRE 
TGGGAAAAOTAATAATGTCCGAGAGATCGAAATTTOGGGGGATGGTAAOTGGGGGGGTGA ' 7140 

FMYAEDLADLIFYVIPKIEF 
GOTTATGTATGGAGAAGATTTAGCTGATCTOATTTTOTATGTTATTGGTAAAAT^ 7200 

MPNMVNAGLGYDYSINDYYK 
GATGCGTAATATGGTAAATGCTGGTTTAGGTTACGATTATTGAATTAATQAGTATTATAA 7260 

IIAEEIGYTGSFSHDLTKPT 
GATAATTCGAGAAGAAATTCGOTATAGTGGGAGTTTTTCTCATGATTTAAGAA^ 7320 

GMKRKLVDISLLNKIGWSSH 
AGGAATGAAAGGGAAGGTAGTAGATATTOGATTGGTTAATAAAATTGGTTGGTGAAGTGA 7380 

FELRDGIRKTYNYYLENQNK 
CTTTGAAGTGAGAGATGGCATCAGAAAGACCTATAA'rTATTACTTGGAGAATGAAAA'rAA 7440 



Start o£ or£7. End o£ or£6 

MITYPLASNTWDEYEYAAIQ 

MIGACTAGATAGGGAGTTQGTAGTAATACTTGGGATGAATATOAOTATOGAGGAATAGAO 7500 

SVIDSKMFTMGKKVELYEKN 
TCAQTAATTGAGTGAAAAATGTTTAGGATGQGTAAAAAQGTTGAO'rTATATGAGAAAAA'r 7560 

FADLFGSKYAVMVS SGSTAN 
TaTOGTQATTTOOTTOGTAGGAAATATQGGQTAATGGTTAGGTCTO^ 7620 
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LLMIAALFFTNKPKLKRGDE 
eTGTTAATGATTGCTGCCCT'TT'rC'rTCACTAATAAAGCAAAACTTAAAAGAGGTGATCAA " 7680 

IIVPAVSWSTTYYPLQQYGL 
ATAATAGTACCTGCAGTGTCATGGTCTACGAGATATTACCCTGTGCAACAGTATGGCTTA 7740 

KVKFVDINKETLNIDIDSLK 
AAGGTGAAGTTTGTCGATATCAATAAAGAAACTTTAAATATTGATATCGATAGTTTGAAA 7800 

NAISDKTKAILTVNL LGNPN 
AATGCTATTTCAGATAAAACAAAAGCAATAOTGACAGTAAATTTATTAGGTAATCCTAAT 7860 

DFAKINEI INNRDIILLEDN 
OATTTTGGAAAAATAAATGAGATAATAAATAATAGGGATATTATCTTAGTAGAAGATAAC 7920 

CESMGAVFQNKQAGTFGVMG 
TCTGAGTCGATGGGCGGGGTCTTTCAAAATAAGCAGGGAGGCAGATTCGGAGTTATGGGT 7980 

TFSSFYSHHIATMEGGCVVT 
ACGTTTAGTTCTTTTTACTGTCATCATATAGCTACAATGGAAGGGQGCTGCGTAGTTACT 8040 

DDEELYHVLLCLRAHGWTRN 
GATGATGAAGAGCTGTATCATGTATTOTTGTGCGTTCGAGG'TGATGG'r'rGGAGAAGAAA'r 8100 

LPKENMVTGTKSDDIFEESF 
OTACCAAAA G A G AATATGGiq ' ACAGGCAGTAAGAGTGATCATATT'PrCGAAGAGTCGT'rT 8160 

KFVLPGYNVRPLEMSGAIGI 
AAGTTTGTTTTACCAGGATAGAATGTTGGCCCAGTTGAAATGAGTGGTGCTATTGGGATA 8220 

EQLKKLPGFISTRRSNAQYF 
GAGGAAGTTAAAAAGTTACCAGGTTTTATATCCACCAGACGTTCGAATGGAGAATAT'FrT 8280 

VDKFKDHPFLDIQKEVGES S 
GTAGATAAATTTAAAGATCATCGATTCGTTGATATAGAAAAAOAAGTTGGTGAAAGTAGO 8340 

WFGFSFVIKEGAAIERKSLV 
TGGTTTCGTTTTTCCTTCGTTATAAAGGAGGGAGGTGGTATTGAGAGGAAGAGTTTAGTA 8400 

NNLISAGIECRPIVTGNFLK 
AATAATCTGATCTGAGGAOGCATTGAATCGCGAGCAATTGTTAGTOGGAATTTOGTC^ 8460 

NERVLSYFDYSVHDTVANAE 
AATGAACGTOTTTTGAGTTATTT'PGATTAGTGTGTAGATGATACGGTAGCAAATGCCGAA - 8520 

YIDKNGFFVGNHQIPLFNEI 
TATATAGATAAGAATGGTTTTTTTGTCGGAAACGAGGAGATAGCTTTQTTTAATQAAATA 8580 



End o£ orf 7 

DYLRKVLK* 

GATTATCTAGGAAAAGTATTAAAArAACTAACGAGGGACTGTATTTGGAATAGAGTGCCT 8640 



Start of orf 8 

MVLTVKKILAFGYSKVLP 
TTAAQMIGGTATTAACAGTOAAAAAAATTTTAGCGTTTGGCTATTCTAAAQTAGTACGAG 8700 

PVIEQFVNPICIFI ITPLIL 
CGGTTATTGAAGAGTTTQTGAATGCAATTTGGATGTTGATTATCACAGGAGTAATAGTGA 8760 

NHLGKQSYGNWILLITIVSF 
AGGAGGTOGGTAAGGAAAGGTATOGTAATTGOATTTTATTAATTAGTATTO'TATGTTTTT 8820 
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SQLICGGCSAWIAKI lAEQR 
CTGAQTTAATATGTGGAGGATGTTCCGCATGGATTGCAAAAATCATTGCAGAACAGAGAA 8880 



ILSDLSKKNALRQISYNFSI 
TTCTTAGTGATTTATCAAAAAAAAATGCTTTACGTGAAA'm'GGTATAATTTTTGAATTG 8940 

VIIAFAVLISFLILSICFFD 
TTATTATCGCATTTGCGGTATTGATTTCTTTTCTTATATOAAGTATTTCTTTCTO^ 9000 

VARNNSSFLFAIIICGFFQE 
TTGCGAGGAATAATTCTTCATTCTTAOTCGCGATTATTATTTGTGGTTTTTTTGAGGA^ 9060 

VDNLFSGALKGFEKFNVSCF 
TTGATAATTTATTTAGTGGTGCGCTAAAAGGTTTTGAAAAATOTAATGTATCATGTTT 9120 

FEVITRVLWASIVIYGIYGN 
TTGAAGTAATTACAAGAGTGCTCTGGGCTTGTATAGTAATATATGGGATTTACGGAAATG 9180 

ALLYFTCLAFTIKGMLKYIL 
CACTGTTATATTTTACATC'PrTAGGer'FrACGA'rTAAAOGTATGGTAAAATATATTCTTG 9240 

VCLNITGCFINPNFNRVGIV 
TATGTCTGAATATTAGGGGTTGTTTGATGAATCOTAATTTTAATAGAGTTGGGATTGTTA 9300 

NLLNESKWMFLQLTGGVSLS 
ATTTGTTAAATGAGTCAAAATGGATGTTTGTTGAATTAACTGGTGGGGTGTCACT'rAGTT 9360 

LFDRLVIPLILSVSKLASYV 
TGTTTGATAGOGTGGTAATACCATTGATTTTATCTGTCAGTAAACTGGGTTGOTATGTCC 9420 

PCLQLAQLMFTLSASANQIL 
CTTGCCTTCAACTAGCTCAATTGATGTTGACTGOTTCTGGGTCTGGAAATGAAATA'rTAC 9480 

LPMFARMKASNTFPSNCF FK 
TACGAATGTTTCCTAGAATGAAAGGATGTAAGAGATTOGCGTGTAATTGT'TTTTTTAAAA 9540 

ILLVSLISVLPCLALFFFGR 
TTCTGGITOTATCACTAATTTCTGTTTTGGGTTGTGTTGGGTTATTGT^ 9600 

DILSIWINPTFATENYKLMQ 
ATATATTAfIK:AATATCGATAAAGGGTAGACTTGGAACTGAAAATTATAAATOAATGGAAA 9660 

ILAISYILLSMMTSFHFLLL 
TOTTAGGTATAAGTTAGATTTTATTGTGAATCATQAGATCTOTTGATTTGTTC 9720 

GIGKSKIiVANLNLVAGLALA 
GAATTGGTAAATCTAAGGTTGTTGGAAATTTAAATCTGGTTCGAGGGGTCGGAGTTGGTG 9780 

ASTLIAAHYGLYAISMVKI I 
GOTGAAGGTTAATGGCAQGTCATTATOGGCTTTATCCAA'rATG'rATGGTAAAAA'rAATAT 9840 

YPAFQFYYLYVAFVYFNRAK 
ATGCGGGTOTTGAATTTTATTAGGTTTATCTAGGTflTOGTCTATTTOAAT^^ 9900 



Start o£ orf9. End o£ or£8 

MSIDLLFSITEIAIVFSCTI 
N V Y * 

^aSTGTATg^TTTAGTTTOTTGAATTAGTGAAATGGGAAOTCCTTTOTGlTO 9960 

YIFTQCLLMRRIYLDKSILI 
TAGATATTTAGTGAATGTTTQ TTAATGCGGAGGATCTATTTAGATAAAAGTATTTTAATT 10020 

LLCLLFFLVIIQLPELNVNG 
CTTTTATGCTTGCTCTTTTTTTTAGTAATC ATTCAACTTCCTC 10080 



Figure 5/8 



LVDSLKLSLPLLMVFIAFQK 
TTGGTCGATTCTTTAAAGTTATCACTGCCTTTATTGATGGTCTTTATCGCTTTTC;^^ 10140 

PKLCLWVIIALLFLNSAFNF 
CCGAAATTATGCTTGTGGGTTATTATTGCATTGTTGTTTTTGAACTCTGC 10200 

LYLKTFDKFS SFPFTFFILL 
TTATATTTAAAGACATTCGATAAGTTTAGCTCATTTCCTTTTACTTTTTTTATATTGCTG 102 60 

FYLFRLGIGNLPVYKNKKFY 
TTTTACTTGTTTAGATTGGG AATTGGTAATTTACCGGTTTATAAAAATAAAAAATTTTAC 10320 

ALIFLFILIDIMQSLLINYR 
GCGTTGATTTTTCTCTTTATATTAATAGACATAATGCAGTCATTGTTAAT;^ 10380 

GQILYSVICILILVFKVNLR 
GGGCAGATTTTATATTCCGTAATTTGCATCCTGATACTTGTGTTTAAAGTTAATTTAAGA 10440 

KKIPYFFLMLPVLYVI IMAY 
AAAAAGATTCCATACTTTTTTTTAATGCTGCC AGTTTTATATGTAATTATTATC^ 10500 

IGFNYFNKGVTFFEPTASNI 
ATTGGTTTTAATTATTTC AATAAAGGCGTAACTTTTTTTGAACCTACAGCAAGTAATATT 10560 

ERTGMIYYLVSQLGDYIFHG 
GAACGTACGGGGATGATATATTATTTGGTTTCACAGCTTGGTGATTATATATTC 10620 

MGTLNFLNNGGQYKTLYGLP 
ATGGGGACATTAAATTTCTTAAATAACGGCGGACAATATAAGACGTTATATGGACTTCCA 10680 

SLIPNDPHDFLLRFFISIGV 
TCATTAATTCCTAATGACCCTCATGATTTTTTATTACGGTTCTTTATAAGTATTC^ 10740 

IGALVYHSIFFVFFRRISFL 
ATAGGAGCATTGGTTTATCATTCTATATTTTTTGTTTTTTTO 10800 

LYERNAPFIVVSCLLLLQVV 
TTATATGAGAGAAATGCTCCTTTCATTGTTGTAAGTTGTTTGTTACTGTTACAAGT^ 10860 

LIYTLNPFDAFNRLICGLTV 
TTAATTTATACATTAAACCCTTTTGATGCTTTTAATCGATTGATTO^ 10920 



start of orflO End o£ or£9 

GVVYGFAKIR* 

MDLQKLDKYTCNGNLDA 
GG AGTTGTTTMGGATTTGCAAAAATTAGA TAAGTATACCTGTAATGGAAATTTAGACGC 10980 

PLVSIIIATYNSELDIAKCL 
TCCACTTGTTTCAATAATCATTGCAACTTATAATTC^^ 11040 

QSVTNQSYKNIEI IIMDGGS 
GCAATCGGTAACTAATCAATCTTATAAGAATATTGAAATCATAATAATGGATGGAG^ 11100 

SDKTLDIAKSFKDDRIKIVS 
TTCTGATAAAACGCTTGATATTGCAAAATCGTTTAAAGACGACCGAATAAAAATAG 11160 

EKDRGIYDAWNKAVDLSIGD 
AGAGAAAGATCGTGGAATTTATGATGCTTGGAATAAAGCAGTTGATTTATCCATI^ 11220 

WVAFIGSDDVYYHTDAIASL 
TTGGGTAGCATTTATTGGTTCAGATGATGTTTACTATOVTACAGATGCAATT^ 11280 

MKGVMVSNGAPVVYGRTAHE 
G ATGAAGGGGGTTATGGTATCTAATGGCGCCCCTGTGGTTTATGGGAGGACAGCGCACGA 11340 



Figure 5/9 



GPDRNI SGFSGSEWYNLTGF 
AGGTCCCGATAGG AAC ATATCTGGATTTTC AGGCAGTGAATGGTACAACCTAAC AGGATT 11400 

KFNYYKCNLPLPIMSAIYSR 
TAAGTTTAATTATTACAAATGTAATTTACCATTGCCCATTATGAGCGC AATATATTCTCG 11460 

DFFRNERFDIKLKIVADADW 
TGATTTCTTCAGAAACGAACGTTTTGATATTAAATTAAAAATTGTTGCTCACGCT<^ 11520 

FLRCFIKWSKEKSPYFINDT 
GTTTCTGAGATGTTTC ATCAAATGGAGTAAAGAGAAGTC ACCTTATTTTATTAATGACAC 11580 

TPIVRMGYGGVSTDISSQVK 
GACCCCTATTGTTAGAATGGGATATGGTGGGGTTTCGACTGATATTTCTTCTCAAGTTAA 11640 

TTLESFIVRKKNNISCLNIQ 
AACTACGCTAGAAAGTTTCATTGTACGCAAAAAGAATAATATATCCTGTTTAAACATACA 11700 

LILRYAKILVMVAIKNIFGN 
GCTGATTCTTAGATATGCTAAAATTCTGGTGATGGTAGCGATC AAAAATATTTTTGGCA^ 11760 

NVYKLMHNGYHSLKKIKNKI 
TAATGTTTATAAATTAATGCATAACGGGTATCATTCCCTAAAGAAAATCAAGAATAAAAT 11820 

Start of orfll. End of orflO 

MKIVYI ITGLTCGGAEHLMT 

M5SAAGATTGTTTATATAATAACCGGGCTTACTTGTGGTGGAGCCGAAC ACCTTATG ACG 11880 

QLADQMFIRGHDVNI ICLTG 
C AGTTAGCAGACCAAATGTTTATACGCGGGCATGATGTTAATATT ATTTGTCTAACTGGT 11940 

ISEVKPTQNINIHYVNMDKN 
ATATCTGAGGTAAAGCCAACACAAAATATTAATATTCATTATGTTAATATGGATAAAAAT 12000 

F R S F F RALFQVKKIIVALKP 
TTTAGAAGCTTTTTTAGAGCTOT'ATTTCAAGTAAAAAAAATAATTGTC 12060 

DIIHSHMFHANIFSRFIRML 
GATATAATACATAGTCATATGTTTCATGCTAATATTTTTAGTCGTTTTATTAGGATGCTG 12120 

IPAVPLICTAHNKNEG GNAR 
ATTCCAGCGGTGCCCCTGATATGTACCGCACACAACAAAAATGAAGGTGGCAATGCAAGG 12180 

MFCYRLSDFLASITTNVSKE 
ATGTTTTGTTATOSACTGAGTGATTTTTTAGCTTCTATTACTACAAATGT;^ 12240 

AVQEFIARKATPKNKIVEIP 
GCTGTTCAAGAGTTTATAGCAAGAAAGGOTACACCTAAAAATAAAATAGTAGAGATTCCG 12300 

NFINTNKFDFDI NVRKKTRD 
AATTTT ATTAATAC AAAT AAATTTGATTrroAT ATTAATGTCAGAAAG AAAACGCG^ 12360 

AFNLKDSTAVIiLAVGRLVEA 
GCTTTTAATTTGAAAGACAGTACTVGCAGTACTGCTCGCAGTAGGAAGACTTGTTGA^ 12420 

KDYPNLLNAINHLILSKTSN 
AAAGACTATCCGAACTTATTAAATGCAATAAAlKiyVTTTGATTCTT^^ 12480 

CNDFILLIAGDGALRNKLLD 
TGTAATG ATTTTATTTTGCTTATTGCIX3GCGATGGCGC 12540 

LVCQLNLVDKVF FLGQRSDI 
TTGGTTTGTCAATTGAATCTTGTGGATAAAGTTTTCTTCI^^ 12600 
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KELMCAADLFVLSSEWEGFG 
AAAGAATTAATGTGTGCTGCAGATCTTTTTGTTTTGAGTTCTGAGTGGGAAG^ 12660 

LVVAEAMACERPVVATDSGG 
CTCGTTGTTGCAGAAGCTATGGCGTGTGAACGTCCCGTTGTTGCTACCGATTCTGGTGGA 12720 

VKEVVGPHNDVIPVSNHIL L 
GTTAAAGAAGTCGTTGGACCTCATAATGATGTTATCCCTGTCAGTAATCATATTCTGTTC 12780 

AEKIAETLKIDDNARKIIGM 
GCAGAGAAAATCGCTGAGAC ACTTAAAATAGATGATAACGCAAGAAAAATAATAGGTATG 12840 

KNREYIVSNFSIKTIVSEWE 
AAAAATAGAGAATATATTGTTTCCAATTTTTCAATTAAAACGATAGTGAGTGAGTGGGAG 12900 



End o£ orfll 

RLYFKYSKRNNIID* 
CGCITATATTTTAAATATTCCAAGCGTAATAATATAATTGAT TGEAAAATATAAGTTTGTA 12960 

CTCTGGATGCAATAGTTTCTCTATGCTCTTTTTTTACTGGCTCCGTATT^ 13 020 

CTGGATTTTGTTATATATCAGTATTAATCTGTCTCAACTTCATCTAGACTACATTCAAG 13080 



Start of gnd 

M S K Q Q I 

CGCGCATGCGTCGCGCGGTGACTACACCTGACAGGAGTATGTA^GTCCAAGCAACAGAT 13140 

GVVGMAVMGRNLALNIESRG 
CGGCGTCGTCGGTATGGC AGTGATGGGGCGCAACCTGGCGCTCAACATCGAAAGCCGCGG 13200 

YTVSIFNRSREKTEEVVAEN 
TTATACCGTCTCCATCTTC AACCGCTCCCGCGAGAAAACTGAAGAAGTTGTTGCCGAGAA 13260 

PDKKLVPYYTVKEFVESLET 
CCCGGATAAGAAACTGGTTCCTTATTACACGGTGAAAGAGTTCGTCGAGTCTCTTGAAAC 13320 

PRRILLMVKAGAGTDAAIDS 
CCCACGTCGTATCCT6TTAATGGTAAAAGC AGGGGCGGGAACTGATGCTGCTATCGATTC 13380 

LKPYLDKGDI IIDGGNTFFQ 
CCTGAAGCCGTATCTGGATAAAGGCGACATCATTATTGATGGTGGCAACACCTTCTTCCA 13440 

DTIRRNRELSAEGFNFIGTG 
GGACACTATCCGTCGTAACCGTGAACTGTCCGCGGAAGGCTTTAACTTCATCGGTACCGG 13500 

VSGGEEGALKGPSIMPGGQK 
CGTGTCCGGCGGTGAAGAGGGCGCCCTGAAAGGCCCATCTATCATGCCAGGTGGCCAGAA 13560 

EAYELVAPILTKIAAVAEDG 
AGAAGCGTATGAGCTGGTTGCGCCTATCCTGACCAAGATTGCrcCGGTT^^ 13620 

EPCITYIGADGAGHYVKMVH 
CGAACCATGTATAACTTACATCGGTGCTGACGGTGCGGGTCACTACGTGAAGATGGT^^ 13680 

NGIEYGDMQLIAEAYSL LKG 
CAACGGTATCGAATATGGCGATATGCAGCTGATTGCTGAAGCCTATTCTCTGCTTAAAGG 13740 

GLNLSNEELATTFTEWNEGE 
CGGCCTTAATCTGTCTAACGAAGAGCTGGCAACCACTTTTACCGAGTGGAATC 13800 

LSSYLIDITKDIFTKKDEEG 
GCTAAGTAGCTACCTGATTGAOVTCACCAAAGACATCTTCACCAAAAAAGATGAAGAGGG 13860 
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KYLVDVILDEAANKGTGKWT 
TAAATACCTGGTTGATGTG ATCCTGGACGAAGCTGCGAAC AAAGGC ACCGGTAAATGGAC 13920 

SQSSLDLGEPLSLITESVFA 
CAGCCAGAGCTCTCTGGATCTGGGTGAACCGCTGTCGCTGATCACCGAATCCGTATTCGC 13980 

RYISSLKDQRIAASKVLSGP 
TCGCTACATCTCTTCTCTGAAAGACCAGCGCATTGCGGCATCTAAAGTGCTGTCTGGTCC 14040 

QAKLAGDKAEFVEKVRRALY 
GCAGGCTAAACTGGCTGGTGATAAAGCAGAGTTCGTTGAG AAAGTCCGTCGCGCGCTGTA 14100 

LGKIVSYAQGFSQLRAASDE 
CCTGGGTAAAATCGTCTCTTATGCCCAAGGCTTCTCTC AACTGCGTGCCGCGTCTGACGA 14160 

YNWDLNYGEIAKIFRAGCI I 
ATACAACTGGGATCTGAACTACGGCGAAATCGCGAAGATCTTCCGCGCGGGCTGCATCAT 14220 

RAQFLQKITDAYAENKGIAN 
TCGTGCGCAGTTCCTGCAGAAAATTACTGAOSCGTATGCTGAAAAC AAAGGC ATTGCTi^ 14280 

LLLAPYFKNIADEYQQALRD 
CCTGTTGCTGGCTCCGTACTTCAAAAATATCGCTGATGAATATCAGCAAGCGCTGCGTGA 14340 

VVAYAVQNGIPVPTFSAAVA 
TGTAGTGGCTTATGCTGTGCAGAACGGTATTCCGGTACCGACCTTCTCTGCAGCGGTAGC 14400 

YYDSYRSAVLPANLIQAQRD 
CTACTACGACAGCTACCGTTCTGCGGTACTGCCGGCTAATCTGATTCAGGCACAGCGTGA 14460 

YFGAHTYKRTDKEGVFHTG 
TTACTTCGGTGCGC AC ACGTATAAACGCACTGATAAAGAAGGTGTGTTCC ACACCG 14516 
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GTAACCAAGGGCGGTACGTGCATAAATTTTAATGCTTATCAAAACTATTAGCATTAAAAA 



60 



Start of or£l 

MNKETVSIIMPVYN 
TATATAAGAAATTCTCAAMIGAACAAAGAAACCGTTTC AATAATTATGCCCGTTTACAAT 120 

GAKTIISSVESIIHQSYQDF 
GGGGCCAAAACTATAATCTCATCAGTAGAATCAATTATACATCAATCTTATCAAGATT^ 180 

VLYI IDDCSTDDTFSLINSR 
GTTTTGTATATCATTGACG ATTGTAGCACCGATGATAC ATTTTCATTAATC AACAGTCGA 240 

YKNNQKIRILRNKTNLGVAE 
TACAAAAACAATCAG AAAATAAGAATATTGCGTAACAAGACAAATTTAGGTGTTGCAGAA 300 

SRNYGIEMATGKYISFCDAD 
AGTCGAAATTATGGAATAGAAATGGCCACGGGGAAATATATTTCTTTTTC 360 

DLWHEKKLERQIEVLNNECV 
GATTTGTGGCACGAGAAAAAATTAGAGCGTCAAATCGAAGTGTTAAATAATGAATGTGTA 420 

DVVCSNYYVIDNNRNIVGEV 
GATGTGGTATGTTCTAATTATTATGTTATAGATAACAATAGAAATATTGTTGGCGAAGTT 480 

NAPHVINYRKMLMKNYIGNL 

aatgctcctcatgtgataaattatagaaaaatgctcatgaaaaactacatagggaatttg 540 

tgiynanklgkfyqkkighe 
acaggaatctataatgccaacaaattgggtaagttttatcaaaaaaagattggtcacgag 600 

dylmwlei inktngaiciqd 
gattatttgatgtggctggaaataattaataaaacaaatggtgctatttgtattcaagat 660 

nlayymrsnnslsgnkikaa 
aatctggcgtattacatgcgttcaaataattcactatcgggtaataaaattaaagctgca 720 

kwtwsiyrehlhlsfpktly 
aaatggacatggagtatatatagagaacatttacatttgtcctttccaaa;^ 780 

YFLLYASNGVMKKITHSLLR 
TATTTTTTATTATATGCTTCAAATGGAGTCATGAAAAAAATAACACATTCA 840 



Start of orf2. End of orfl 

R K E T K K * 

VKSAAKLIFLFLFT 
AGAAAGGAGACTAAAAA£23SAAGTCAGCGGCTAAGTTGATTTTTTTAT^ 900 

LYSLQLYGVI iddritnfdt 
TTTATAGTCTCCAGTTGTATGGGGTTATCATAGATGATCGTATAACAAATTTT^ 960 

kvltsiiiifqiffvllfyl 

AGGTATTAACTAGTATTATAATTATATTTCAGATTTTTTTTGTT^ 1020 

TI INERKQQKKFIVNWELKL 
CGATTATAAATGAAAGAAAACAGCAGAAAAAATTTATCGTGAACTGGGAGCTAAAGTTAA 1080 

ILVFLFVTIEIAAVVLFLKE 
TACTCGTTTTCCTTTTTGTGACTATAGAAATTGCTGCTGTAGT^ 1140 

GIPIFDDDPGGAKLRIAEGN 
GTATTCCTATATTTGATGATGATCCAGGGGGGGCTAAACTTAGAATAGCTGAAGGTAATG 1200 
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GLYIRYIKYFGNIVVFALII 
GACTTTACATTAGATATATTAAGTATTTTGGTAATATAGTTGTGTTTGC ATTAATTATTC 1260 

LYDEHKFKQRTI IFVYFTTI 
TTTATGATGAGCATAAATTCAAACAGAGGACCATCATATTTGTATATTTTACAACGATTC 1320 

ALFGYRSELVLLILQYILIT 
CTTTATTTGGTTATCGTTCTGAATTGGTGTTGCTCATTCTTCAATATATAT^ 1380 

NILSKDNRNPKIKRI IGYFL 
ATATCCTGTCAAAGGATAACCGTAATCCTAAAATAAAAAGAATAATAGGGTATTTTTTAT 1440 

LVGVVCSLFYLSLGQDGEQN 
TGGTAGGGGTTGTATGCTCGTTGTTTTATCTAAGTTTAGGACAAGACGGAGAACAAAATG 1500 

DSYNNMLRI INRLTIEQVEG 

ACTC ATATAATAATATGTTAAGGATAATTAATAGGTTAACAATAGAGC AAGTTGAAGGTG 1560 

VPYVVSESIKNDFFPTPELE 
TTCCATATGTTGTTTCTGAATCTATTAAGAAaSATTTCTT^ 1620 

KELKAIINRIQGIKHQDLFY 
AGGAATTAAAAGCAATAATAAATAG AATACAGGGAATAAAGCATCAAGACTTATTTTATG 1680 

GERLHKQVFGDMGANFLSVT 
GAGAACGGTTACATAAACAAGTATTTGGAGACATGGGAGCAAATTTTTTATCTVGT^^ 1740 

TYGAELLVFFGFLCVFIIPL 
CGTATGGAGCAGAACTGOTAGTTTTTTTTGGTTTTCTCTGTGTATTCAT^ 1800 

GIYIPFYLLKRMKKTHSSIN 

GG ATATATATACCTTTTTATCTTTTAAAGAGAATGAAAAAAACCCATAGCTCGATAAATT 1860 

CAFYSYI IMILLQYLVAGNA 
GCGCATTCTATTCATATATCATTATG ATTTTATTGCAATACTTAGTGGCTGGGAAT^ 1920 

SAFFFGPFLSVLIMCTPLIL 
CGGCCTTCTTTTTTGGTCCTTTTCTCTCCGTATTGATAATGTGTACTC^ 1980 



Start of orf 3 

MKISVITVTY 
LHDTLKRLSRNENISYNCDL 
TGCATGATACGOTAAAGAGATTATCACGAAMSAAAATATCAGTTATAACTGTGACTTAT 2040 



End o£ or£2 

NNAEGLEKTLSSLSILKIKP 

AflTAATGCTGAAGGGTTAGAAAAAACTTTAAGTAGTTTATCAATTTTAAAA^ 2100 

FEIIIVDGGSTDGTNRVISR 
TTTGAGATTATTATAGTTGATGGCGGCTCTACAGATGGAACGAATCGTGTCATO^ 2160 

FTSMNITHVYEKDEGIYDAM 
TTTACTAGTATGAATATTACACATGTTTATGAAAAAGATGAAGGGATATATGATG^ 2220 

NKGRMLAKGDLIHYLNAGDS 
AATAAGGGCCGAATGTTGGCCAAAGGCGACTTAATACATTATTTAAACGCCGGCGATAGC 2280 

VIGDIYKNIKEPCLIKVGLF 
GTTVATTGGAGATATATATAAAAATATCAAAGAGCCATGTTTGATTAAAGTTGGCCT^ 2340 

ENDKLIiGFSSITHSNTGYCH 
GAAAATGATAAACTTCTGGGATTTTCTTCTATAACCCATTCAAATACAGGG^^ 2400 



Figure 6/2 



QGVIFPKNHSEYDLRYKICA 
CAAGGGGTGATTTTCCCAAAGAATCATTCAGAATATGATCTAAGGTATAAAATATGTGCT 2460 

DYKLIQEVFPEGLRSLSLIT 
GATTATAAGCTTATTCAAGAGGTGTTTCCTGAAGGGTTAAGATCTCTATCTTTGATTACT 2520 

SGYVKYDMGGVSSKKRILRD 
TCGGGTTATGTAAAATATG ATATGGGGGGAGTATCTTCAAAAAAAAGAATTTTAAGAGAT 2580 

KELAKIMFEKNKKNLIKFIP 
AAAGAGCTTGCCAAAATTATGTTTGAAAAAAATAAAAAAAACCTTATTAAGTTTATTCC^ 2640 

ISIIKILFPERLRRVLRKMQ 
ATTTCAATAATCAAAATTTTATTCCCTGAACGTTTAAGAAGAGTATTGCGGAAAATGCAA 2700 



Start of orf4 End of orf3 
YICLTLFFMKNSSPYDNE* 

M I M N K I 

TATATTTGTCTAACTTTATTCTTCATGAAGAATAGTTCACCATATGATAATO 2760 

KKILKFCTLKKYDTSSALGR 
CAAAAAAATACTTAAATTTTGCACTTTAAAAAAATATGATACATCAAGTGCTTTAGGTA^ 2820 

EQERYRI ISLSVISSLISKI 
AGAAC AGGAAAGGTACAGGATTATATCCTTGTCTGTTATTTCAAGTTTGATTAGTAAAAT 2880 

LSLLSLILTVSLTLPYLGQE 
ACTCTCACTACTTTCTCTTATATTAACTGTAAGTTTAACTTTACCTTATTTAGGAC^^ 2940 

RFGVWMTITSLGAALTFLDL 
GAGATTTGGTGTATGGATGACTATTACCAGTCTTGGTGCTGCTCTGACATTT^ 3000 

GIGNALTNRIAHSFACGKNL 
AGGTATAGGAAATGC ATTAAC AAACAGGATCGCACATTCATTTGCGTGTGGCAAAAATTT 3060 

KMSRQISGGLTLLAGLSFVI 
AAAGATGAGTCGGCAAATTAGTGGTGGGCTCACTTTGCTGGCTGGATTATCGTTTGTCAT 3120 

TAICYITSGMIDWQLVIKGI 
AACTGCAATATGCTATATTACTTCTGGCATGATTGATTGGCAACTAGTAATAAAAGGTAT 3180 

NENVYAELQHSIKVFVI IFG 
AAACGAGAATGTGTATGCAGAGTTACAACACTCAATTAAAGTCTTTGTAATCATATT^^ 3240 

LGIYSNGVQKVYMGIQKAYI 
ACTTGGAATTTArrCAAATGGTGTGCAAAAAGTTTATATGGGAATACAAA^ 3300 

SNIVNAIFILLSIITLVISS 
AAGTAATATTGTTAATGCCATATTTATATTGTTATCTATTATTACTCTAGT^ 3360 

KLHAGLPVLIVSTL6IQYIS 
GAAACTACATGCGGGACTACCAGTTTTAATTGTCAGCACTCTTGGTAT^ 3420 

GIYLTINLI IKRLIKFTKVN 
GGGAATCTATTTAACAATOAATCTTATTATAAAGCGATTAATAAAGTTTACAAAAGTTA^ 3480 

IHAKREAPYLILNGFFFFIL 
CATACATGCTAAAAGAGAAGCTCCATATTTGATATTAAACGGTTTTTTCTTTTO 3540 

QLGTLATWSGDNFI ISITLG 
ACAGTTAGGCACTCTGGCAACATGGAGTGGTGATAACTTTATAATATCTATAACATTGGG 3600 



Figure 6/3 



VTYVAVFSITQRLFQISTVP 
TGTTACTTATGTTGCTGTTTTTAGCATTAC ACAGAGATTATTTCAAATATCTACGGTCCC 3660 

LTIYNIPLWAAYADAHARND 
TCTTACGATTTATAACATCCCGTTATGGGCTGCTTATGCAGATGCTCATGCACGCAATGA 3720 

TQFIKKTLRTSLKIVGISSF 
TACTCAATTTATAAAAAAGACGCTCAGAACATCATTGAAAATAGTGGGTATTTCATC ATT 3780 

LLAFILVVFGSEVVNIWTEG 
CTTATTGGCCTTCATATTAGTAGTGTTCGGTAGTGAAGTCGTTAATATTTGGACAGAAGG 3840 

KIQVPRTFI lAYALWSVIDA 
AAAGATTCAGGTACCTCGAACATTCATAATAGCTTATGCTTTATGGTCTGTTATTGATGC 3900 

FSNTFASFLNGLNIVKQQML 
TTTTTCGAATACATTTGCAAGCTTTTTAAATGGTTTGAACATAGTTAAACAACAAATC^ 3960 

AVVTLILIAIPAKYI IVSHF 
TGCTGTTGTAACATTGATATTGATCGCAATTCCAGCAAAATACATCATAGTTAGCCATTT 4020 

GLTVMLYCFIFIYIVNYFIW 
T GGGTTAACTGTTATGTTGTACTGCTTCACTTTa'ATATATATTGTAAATTACTTTATATG 4080 



Start o£ or€5r £nd of or£4 

M K M 

YKCSFKKHIDRQLNIRG* 
GTATAAATGTAGTTTTAAAAAACATATCGATAGAGAQTTAAATATAAGAGGATOAAAAT G 4140 

KYIPVYQPSLTGKEKEYVNE 
AAATATATACCAGTTTACGAACCGTCATTGACAGGAAAAGAAAAAGAATAT G TA AA T G A A 4200 

CLDSTWISSKGNYIQKFENK 
TGTCTGGACTGAACGTGGATTTCATCAAAAGGAAACTATATTGAGAAGTTTGAAAATAAA 4260 

FAEQNHVQYATTVSNGTVAL 
TTTGCGGAAOAAAACCATCTGCAATATCCAACTACTGTAAGTAATGGAAGGGTTGCTGTT 4320 

HLALLALGISEGDEVIVPTL 
GATTTAGQTTTGTTAGCGCTAGGTATATCGGAAGGAGATQAAGTTATTGTTGGAAGACTQ 4380 

TYIASVNAIKYTGATPIFVD 
ACATATATAGCATGAGTTAATGGTATAAAATAGAGAGGAGGGAGGCGGATTTTCGTTGAT 4440 

SDNETWQMSVSDIEQKITNK 
TGAGATAATGAAAGTTGGCAAATGTGTGTOAG1K3AGATAGAAGAAAAAATGAGTAATAAA 4500 

TKAIMCVHLYGHPCDMEQIV 
AGTAAAGCTATTATOTGTG'TGGATTTATAGGGAGATGGATGTGATATGGAACAAATTGTA 4560 

ELAKSRNLFVIEDCAEAFGS 
GAACTGGGGAAAAGTAGAAATTTCTTTOTAATnKaAAGA'TTGGGGTGAAGGGTTTGGTTGT 4620 

KYKGKYVGTFGDISTFSF FG 
AAATATAAAGGTAAATATGTGGGAAGATTTGGAGATATTTCTAGTTTTAGGTTTTTTGG A- 4680 

NKTITTGEGGMVVTNDKTLY 
AATAAAACTATTAGTAGAGGTOAAGGTGGAATGQTTGTGAGGAATGAGAAAACAGTOTAT 4740 

DRCLHFKGQGLAVHRQYWHD 
QAGCGTTCTTTAGATTTOAAAGGGGAAGGAOTAGOTOTAGATAGGGAA'TATTGGGATOAG 4800 

VIGYNYRMTNICAAIGLAQL 
GTTATAOOGTAGAATTATAGOATOAGAAATATGTQGGGTGCTATAGGATTAGGGOAGTTA 4860 



Figure 6/4 



EQADDFISRKREIADIYKKN 
GAACAAGCTGATGATTOTATATCACGjAAACGTGAAATTGCTOATA'rT'rATAAAAAAAAT 4920 



INSLVQVHKESKDVFHTYWM 
ATCAAGAGTCTTGTACAAGTCGACAAGGAAAGTAAAGATGTTTTTGACACTTATTGGATG 4980 

VSILTRTAEEREELRNHLAD 
GTCTCAATTCTAACTAGGACGGCAGAGGAAAGAGAGGAATTAAGGAATCACC'rTGCAGAT 5040 

KLIETRPVFYPVHTMPMYSE 
AAACTCATCOAAACAAGGCCAGTTTTTTAGGCTGTCGAGACGATGCGAATGTACTCGGAA 5100 

KYQKHPIAEDLGWRGINLPS 
AAATATCAAAAGGAGCCTATAGCTGAGGATCTTGGTTGGCGTGGAATTAA'PTTACCTAGT 5160 

FPSLSNEQVIYICESINEFY 
TTCCCGAGCGTATCGAATGAGGAAGCTATTTATATTTGTGAATGTACTAACGAATTTTAT 5220 



End of orfS Start o£ or£6 

SDK* MKIALNSD 
AGTGATAAA?A(K:GTAAAATATTGTAAAGGTGATTGM!GAAAATTGCGTTGAAT'rGAOAT 5280 

GFYEWGGGIDFIKYILSILE 
GGATTTTACQAQTGGGGGGGTGGAATTG ATTTTATTAAATATATTCTGTCAATATTAGAA 5340 

TKPEICIDILLPRNDIHSLI 
ACGAAACCAGAAATATGTATCGATATTCTTTTACCGAGAAATGATATACATTCTCTTATA 5400 

REKAFPFKSILKA ILKRERP 
AGAGAAAAAGCATTTCCTTTTAAAAGTATATTAAAAGCAATTTTAAAG AGGGAAAGGCCT 5460 

RWISLNRFNEQYYRDAFTQN 
CGATGGATTTCATTAAATAGATTTAATGAGCAATACTATAGAGATGCCTTTACACAAAAT 5520 

NIETNLTFIKSKS SAFYSYF 
AATATAGAGACGAATCTTACCTTTATTAAAAGTAAGAGCTCTGCCTTTTATTCATAl^ 5580 

DSSDCDVILPCMRVPSGNLN 
GATAGTAGCGATTGTGATGTTATTCTTCCTTGCATGCGTGTTCCTTCGGG;^ 5640 

KKAWIGYIYDFQHCYYPSFF 
AAAAAAGCATGGATTGGTTATATTTATGACTTTCAACACTGTTACTATCCTT^ 5700 

SKREIDQRNVFFKLMLNCAN 
AGTAAGCGAGAAATAGATCyVAAGGAATGTGTTTTTTAAATTGATGCl^^ 5760 

NI IVNAHSVITDANKYVGNY 
AATATTATTGTTAATGCAC ATTCAGTTATTACCGATGCAAATAAATATGTTGGGAAOT^ 5820 

SAKLHSLPFSPCPQLKWFAD 
TCTGCAAAACTACATTCTCTTCCATTTAGTCCATGCCCTCAATTAAAAT^ 5880 

ysgniakynidkdyfi icnq 

TACTCTGGTAATATTGCC AAATATAATATTGACAAGGATTATTTTATAATT^ 5940 

fwkhkdhatafrafkiytey 

TTTTGGAAACATAAAGATCATGCAACTGCTTTTAGGGCATTTAAAATOT 6000 

NPDVYLVCTGATQDYRFPGY 
AATCCTGATGTTTATTTAGTATGCACGGGAGCTACTCAAGATTATCGATTCCCTOT 6060 

FNELMVLAKKLGIESKIKIL 
TTTAATGAATTGATGGTTTTGGCAAAAAAGCTCGGAATTGAATCGA^^ 6120 
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GHIPKLEQIELIKNCIAVIQ 
GGGC ATATACCTAAACTTGAACAAATTGAATTAATCAAAAATTGCATTGCTGT^^ 6180 

PTLFEGGPGGGVTFDAIALG 
CCAACCTTATTTGAAGGCGGGCCTGGAGGGGGGGTAACATTTGACGCTATTGCATTAGGG 6240 

KKVILSDIDVNKEVNCGDVY 
AAAAAAGTTATACTATCTGACATAGATGTCAATAAAGAAGTTAATTGCGGTGATGTATAT 6300 

FFQAKNHYSLNDAMVKADES 
TTCI^TCAGGCAAAAAACCATTATTCATTAAATGACGCGATGGTAAAAGCTG^ 6360 

KIFYEPTTLIELGLKRRNAC 
AAAATTTTTTATGAACCTACAACTCTGATAGAATTGGGTCTCAAAAGACGCAATGCGTGT 6420 



End o£ or £6 

ADFLLDVVKQEIESRS* 
GCAGATTTTCTTTTAGATGTTGTGAAACAAGAAATTGAATCCCGATCT TAATATATTCAA 6480 



Start o£ or£7 

MTKVALITGVTGQDGSY 
GAGGTATATAMSACTAAAGTCGCTCTTATTACAGGTGTAACTGGACAAGATGGATCTTA 6540 

LAEFLLDKGYEVHGIKRRAS 
TCTAGCTGAGTTTTTGCTTGATAAAGGGTATGAAGTTCATGGTATCAAACGCC^^ 6600 

SFNTERIDHIYQDPHGSNPN 
ATCTTTTAATACAGAACGCATAGACC ATATTTATCAAGATCCACATGGTTCTAACCCAAA 6660 

FHLHYGDLTDS SNLTRILKE 
TTTTCACTTCCACTATGGAGATCTGACTGATTCATCTAACCTCACTAGAATTCTAAAG^ 6720 

VQPDEVYNLAAMSHVAVSFE 
GGTACAGCCAGATGAAGTATATAATTTAGCTGCTATGAGTCACGTAGCAGTTTCTT^ 6780 

SPEYTADVDAIGTLRLLEAI 
GTCTCCAGAATATACAGCCGATGTCGATGCAATTGGTACATTACGTTTACTGGAAGCAAT 6840 

RFIiGLENKTRFYQASTSELY 
TCGCTTTTTAGGATTGGAAAACAAAACGCGTTTCTATCAATC 6900 

GLVQEIPQKESTPFYPRSPY 
TGGACTTGTTCAGGAAATCCCTCAAAAAGAATCCACCCCTTTTTATCCTC 6960 

AVAKLYAYWITVNYRESYGI 
TGCAGTTGCAAAACTTTACGCATATTGGATO^CXSGTAAATTATCXSAGAGT^ 7020 

YACNGILFNHESPRRGETFV 
TTATGCATGTAATGGTATATTGTTCAATCATGAATCTCCACGCCGTGGAG^^ 7080 

TRKITRGLANIAQGLESCLY 
AACAAGGAAAATTACTCGAGGACTTGCJ^TATTGCAC^GGCTTGGAA 7140 

LGNMDSLRDWGHAKDYVRMQ 
TTTAGGGAATATGGATTCGTTACGAGATTGGGGACATGCAAAAGATTATGTTAGAATGCA 7200 

WLMLQQEQPEDFVIATGVQY 
ATGGTTGATGTTACAACAGGAGCAACCCGAAGATTTTGTGATTGCAACAGGAGTC 7260 

SVRQFVEMAAAQLGIKMSFV 
CTCAGTCCGTCAGTTTGTCGAAATGGCAGCyVGCACAACrTGGTATTAAGAT^ 7320 
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GKGIEEKGIVDSVEGQDAPG 
TGGTAAAGGAATCGAAGAAAAAGGCATTGTAGATTCGGTTGAAGGACAGGATGCTCCAGG 7380 

VKPGDVIVAVDPRYFRPAEV 
TGTGAAACCAGGTGATGTCATTGTTGCTGTTGATCCTCGTTATTTCCGACCAGCTC 7440 

DTLLGDPSKANLKLGWRPEI 
TGATACTTTGCTTGGAGATCCGAGCAAAGCTAATCTCAAACTTGGTTGGAGACCAGAAAT 7500 

TLAEMISEMVAKDLEAAKKH 
TACTCTTGCTGAAATGATTTCTGAAATGGTTGCCAAAGATCTTGAAGCCGCTAAAAAACA 7560 



Start o£ or£8. End of or£7 

M M M N K 
SLLKSHGFSVSLALE* 
TTCTCTTTTAAAATCGCATGGTTTTTCTGTAAGCTTAGCTCTGGAM^ 7620 

QRIFIAGHQGMVGSAITRRL 
CAACGTATTTTTATTGCTGGTCACCAAGGAATGGTTGGATCAGCTATTACCCGACGCCTC 7680 

KQRDDVELVLRTRDELNLLD 
AAACAACGTGATGATGTTGAGTTGGTTTTACGTACTOSGGATGAATTC 7740 

S SAVLDFFSSQK IDQVYLAA 
AGTAGCGCTGTTTTGGATTTTTTTTCTTCACAGAAAATCGACCAGGT^ 7800 

AKVGGILANSSYPADFIYEN 
GCAAAAGTCGGAGGTATTTTAGCTAACAGTTCm'ATCCTGCCGATTTTATATATGAGAAT 7860 

IMIEANVIHAAHKNNVNKLL 
ATAATGATAGAGGCGAATGTCATTCATGCTGCCCACAAAAATAATGTAAATAAACTGCTT 7920 

FLGSSCIYPKLAHQPIMEDE 
TTCCTCGGTTCGTCGTGTATTTATCCTAAGTTAGCACACCAACCGATTATGGAAGACGAA 7980 

LLQGKLEPTNEPYAIAKIAG 
TTATTACAAGGGAAACTTGAGCCAACAAATGAACCTTATGCTATCGCAAAAATTC 8040 

IKLCESYNRQF6RDYRSVMP 

attaaattatgtgaatcttataaccgtcagtttgggcgtgattaccgttcagt;^ 8100 
tnlygpndnfhpsnshvipa 

ACCAATCTTTATGGTCCT^TGACAATTTTCATCC^GTAATT^ 8160 

llrrfhdavennspnvvvwg 

CTTTTGCGCCGCTTTCATGATGCTGTGGAAAACAATTCTCCG 8220 

sgtpkreflhvddmasasiy 

AGTGGTACTCCAAAGCGTGAATTCTTACATGTAGATGATATGGCT^ 8280 

VMEMPYDIWQKNTKVMLSHI 
GTCATGGAGATGCCATACGATATATGGCAAAAAAATACTAAAGTAATGTTGTCTCATATC 8340 

NIGTGIDCTICELAETIAKV 
AATATTGGAACAGGTATTGACTGCACGATTTGTGAGCTTGCGGAAACAATAG 8400 

VGYKGHITFDTTKPDGAPRK 
GTAGGTTATAAAGGGCATATTACX5TTCGATACAACAAAGCCCGATGGAGCCCCTCGAAAA 8460 

LLDVTLLHQLGWNHKITLHK 
CTACTTGATGTAACGCTTCTTCATCAACTAGGTTGGAATCATAAAATTACCCTTC 8520 
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End of or £8 

GLENTYNWFLENQLQYRG* 
GGTCTTGAAAATACATACAACTCGTTTCTTGAAAACCAACTTCAATATCGGGGG TAATAA 8580 

Start of or£9 

MFLHSQDFATIVRSTPLISI 
TGTTTTTACATTCCCAAGACTTTGCCACAATTGTAAGGTCTACTCCTCTTAT^ 8640 

DLIVENEFGEILLGKRINRP 
ATTTGATTGTGGAAAACGAGTTTGGCGAAATTTTGCTAGGAAAACGAATCAACCGCCCGG 8700 

AQGYWFVPGGRVLKDEKLQT 
CACAGGGCTATTGGTTCGTTCCTGGTGGTAGGGTGTTGAAAGATGAAAAATTGCAGACAG 8760 

AFERLTEIELGIRLPLSVGK 
CCTTTGAACGATTGACAGAAATTGAACTAGGAATTCGTTTGCCTCTCTCTGTGGGTi^ 8820 

FYGIWQHFYEDNSMGGDFST 
TTTATGGTATCTGGC AGCACrrrCTACGAAGACAATAGTATGGGGGGAGACTTO 8880 

HYIVIAFLLKLQPNILKLPK 
ATTATATAGTTATAGCATTCCTTCTTAAATTACAACCAAACATTTTGAAATTACCGAAGT 8940 

SQHNAYCWLSRAKLINDDDV 
CACAACATAATGCTTATTGCTGGCTATCGCGAGCAAAGCTGATAAATGATGACGATGTGC 9000 

HYNCRAYFNNKTNDAIGLDN 
ATTATAATTGTCGCGC ATATTTTAAC AATAAAACAAATGATGCGATTGGCTTAGATAATA 9060 



Start of orflO End of orf9 

MSDAPI lAVVMAGGTGS 
KDIICLMRQ* 

AGGATATAATMIGTCTGATGCGCCAATAATTGCTGTAGTTATGGCCGGTGGTA^ 9120 

RLWPLSRELYPKQFLQLSGD 
TCGTCTTTGGCCACTTTCTCGTGAACTATATCCAAAGCAGTTTTTACAACTC 9180 

NTLLQTTLLRLSGLSCQKPL 
TAACACCTTGTTACAAACGACTTTGCTACGACTTTCAGGCCTATCATGTC^^ 9240 

VITNEQHRFVVAEQLREINK 
AGTGATAACAAATGAACAGCATCGCTTTGTTGTGGCTGAACAGTTAAGGGAAATA^ 9300 

LNGNI ILEPCGRNTAPAIAI 
ATTAAATGGTAATATTATTCTAGAACCATGCXK3GCGAAATACTGCACCAGCAATAGCGAT 9360 

SAFHALKRNPQEDPL LLVLA 
ATCTGCGTTTCATGCGTTAAAACGTAATCCTCAGGAAGATCCATTGCrTCT^^ 9420 

ADHVIAKESVFCDAIKNATP 
GGCAGACCACGTTATAGCTAAAGAAAGTGTTTTCTGTGATGCTATTAAAAATGCAACTC 9480 

lANQGKIVTFGI IPEYAETG 
CATCXX:TAATCAAGGTAAAATTGTAACGTTTGGAATTATACCAGA^ 9540 

YGYIERGELSVPLQGHENTG 
TTATGGGTATATTGAGAGAGGTGAACTATCTGTACCGCTTCAAGGGCATGAAAATACTOT 9600 

FYYVNKFVEKPNRETAELYM 
TTTTTATTATGTAAATAAGTTTGTCGAAAAGCCTAATCGTGAAACCGCAGAATTGTAT^ 9660 

TSGNHYWNSGIFMFKASVYL 
GACTTCTGGTAATCACTATTGGAATAGTGGAATATTCATGTTTAAGGC^ 9720 



Figure 6/8 



EELRKFRPDIYNVCEQVAS S 
TGAGGAATTGAGAAAATTTAGACCTGACATTTACAATGTTTGTGAACAGGTTGrc^^ 9780 

SYIDLDFIRLSKEQFQDCPA 
CTCATAC ATTGATCTAGATTTTATTCGATTATCAAAAGAACAATTTCAAGAT^ 9840 

ESIDFAVMEKTEKCVVCPVD 
TGAATCTATTG ATTTTGCTGTAATGGAAAAAACAGAAAAATGTGTTGTATGCCCTGTTG A 9900 

IGWSDVGSWQSLWDISLKSK 
TATTGGTTGGAGTGACGTTGGATCTTGGCAATCGTTATGGGAC ATTAGTCTAAAATC 9960 

TGDVCKGDILTYDTKNNYIY' 
AACAGGAGATGTATGTAAAGGTGATATATTAACCTATGATACTAAGAATAATTATATCTA 10020 

SESALVAAIGIEDMVIVQTK 
CTCTGAGTCAGCGTTGGTAGCCGCCATTGGAATTGAAGATATGGTTATCGTGCAAA 10080 

DAVLVSKKSDVQHVKKIVEM 
AGATGCCGTTCTTGTGTCTAAAAAGAGTGATGTACAGCATGTAAAAAAAATAGTCGAAAT 10140 

LKLQQRTEYISHREVFRPWG 
GCTTAAATTGCAGCAACGTACAGAGTATATTAGTCATCGTGAAGTTTTCCGACCATGGGG 10200 

KFDSIDQGERYKVKKIIVKP 
AAAATTTGATTCGATTGACCAAGGTGAGCGATACAAAGTCAAGAAAATTATTGTGAAACC 10260 

GEGLSLRMHHHRSEHWIVLS 
TGGTGAGGGGCTTTCTTTAAGGATGCATCACCATCGTTCTGAACATTGGATCGTGCTTTC 1032 0 

GTAKVTLGDKTKLVTANESI 
TGGTACAGCAAAAGTAACCCTTGGCGATAAAACTAAACTAGTCACCGCAAATGAATCGAT 10380 

YIPLGAAYSLENPGI IPLNL 
ATACATTCCCCTTGGCGCAGCGTATAGTCTTGAGAATCCGGGCATAATCCCTCTTAATCT 10440 

lEVSSGDYLGEDDIIRQKER 
TATTGAAGTCAGTTCAGGGGATTATTTGGGAGAGGATGATATTATAAGACAGAAAGAACG 10500 

End of orflO Start o£ or£ll 

YKHED* MKSLTCFKAYDIR 
TTACAAACATGAAGAT TAAC ATMGAAATCTTTAACCTGCTTTAAAGCCTATGATATTCG 10560 

GKLGEELNEDIAWRIGRAYG 
CGGGAAATTAGGCXSAAGAACTGAATGAAGATATTGCCTGGOSCATTGGGCGTGCCTA 10620 

EFLKPKTIVLGGDVRLTSEA 
CGAATTTCTCAAACCGAAAACCATTGTTTTAGGCGGTGATGTCCGCCTCy^^ 10680 

LKLALAKGLQDAGVDVLDIG 
GTTAAAACTGGCGCTTGCGAAAGGTTTACAGGATGCGGGCGTCGATGTGCTG^ 10740 

MSGTEEIYFATFHLGVDGGI 
TATGTCCGGCACCGAAGAGATCTATTTCGCCACGTTCCATCTCGGAGTGGATGGC 10800 

EVTASHNPMDYNGMKLVREG 
CGAAGTTACCGCCAGCCATAACCCGATGGATTACAACGGC ATGAAGCTGGTGCGCGAAGG 10860 

ARPISGDTGLRDVQRLAEAN 
GGCTCGCCCGATCAGCGGTGATACCGGACTGCXXrGATGTCCAGCGTCTGGCAGAAGCCAA 10920 

DFPPVDETKRGRYQQINLRD 
TGACTTCCCTCCTGTCGATGAAACCAAACGTGGTCGCTATCAGCAAATCAATCTGTO 10980 
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AYVDHLFGYINVKNLTPLKL 
CGCT^ACGTTGATC ACCTGTTCGGTTATATCAACGTCAAAAACCTC ACGCCGCTCAAGCT 11040 



VINSGNGAAGPVVDAIEARF 
GGTG ATCAACTCCGGGAACGGCGC AGCGGGTCCGGTGGTGGACGCC ATTGAAGCCCGATT 11100 

KALGAPVELIKVHNTPDGNF 
TAAAGCCCTCGGCGCACCGGTGGAATTAATCAAAGTAC AC AACACGCCGGACGGC AATTT 11160 

PNGI PNPLLPECRDDTRNAV 
CCCCAACGGTATTCCTAACCCGCTGCTGCCGGAATGCCGCGACGACACCCGTAATGCGGT 11220 

IKHGADMGIAFDGDFDRCFL 
CATCAAACACGGCGCGGATATGGGCATTGCCTTTGATGGCGATTTTGACCGCTGT^ 11280 

FDEKGQFIEGYYIVGLLAEA 
GTTTGACGAAAAAGGGCAGTTTATCGAGGGCTACTACATTGTCGGCCTGCTGGC AGAAGC 11340 

FLEKNPGAKI IHDPRLSWNT 
GTTCCTCGAAAAAAATCCCGGCGCGAAGATCATCCACGATCCACGTCTCTCCTGGAACAC 11400 

VDVVTAAGGTPVMSKTGHAF 
CGTTGATGTGGTGACTGCCGCAGGCGGCACCCCGGTAATGTCGAAAACCGGACACGCCTT 11460 

IKERMRKEDAIYGGEMSAHH 
TATTAAAGAACGTATGCGCAAGGAAGACGCCATCTACGGTGGCGAAATGAGCGCTCACCA 11520 

YFRDFAYCDSGMIPWLLVAE 
TTACTTCCGTGATTTCGCTTACTGCGAC AGCGGCATGATCCCGTGGCTGCTGGTCGC^ 11580 

LVCLKGKTLGEMVRDRMAAF 
ACTGGTGTGCCTGAAAGGAAAAACGCTGGGCGAAATGGTGCGCGACCGGATGGCGGCGTT 11640 

PASGEINSKLAQPVEAINRV 
TCCGGCAAGCGGTGAGATCAACAGCAAACTGGCGCAACCCGTTGAGGCAATTAATCGCGT 11700 

EQHFSREALAVDRTDGISMT 
GGAACAGCATTTTAGCCGCGAGGCGCTGGCGGTGGATCGCACCGATGGCATCAGCATGAC 11760 

FADWRFNLRS SNTEPVVRLN 
CTTTGCCGACTGGCGCTTTAACCTGCGCTCCTCCAACACCGAACCGGTGGTGCGGT^^ 11820 

VESRGDVKLMEKKTKALLKL 
TGTGGAATC ACGCGGTGATGTAAAGCTAATGGAAAAGAAAACTAAAGCTCTTCTTAAATT 11880 

End o£ orfll 

L S E * 

GCTAAGTGAG raiTTATTTACATTAATCATTAAGCGTATTTAAGATTATATTAAAGTAAT 11940 
GTTATTGCGGTATATGATGAATATGTGGGCTTTTrrATGTATAACGACTATACCGC^ 12000 



Start of H- repeat 

TTATCTAGG AAAAGATTAATAG AAATAAAGTTTTGTACTGACCAATTTGCATTTC^ 12060 

ACGATTGAGACGTTCCTTTGCTTAAGACATTTTTTCATTC 1212 0 

CCTTATATAAAAAGGAGAAC AAAATG6AACTTAAAATAATTGAGACAATAGATTTTTATT 12180 

ATCCCTGTTTACGATATTATAGCCAAAGTTCTATCCTGCATCAGTCCTGCAATAT^ 12240 

G AGTGCTTTGTTAACTCAATACATGTCTGCCATTTTCCA^ 12300 

ATTGATCGTAAAACACTTCGGCACACn-ATGAC AAGAGTCGTCGCAGAGGAGTGG^^ 12360 
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GTCATTAGTGCGTTTCAGCAATGCACAGTCTGGTCCTCGGATAGATCAAGACGGATGAGA 12420 

AACCTAATGCGTTCAC AGTTATTCATGAACTTTCTAAAATG ATGGGTATTAAAGGAAAAA 12480 

TAATCATAACTGATGCGATGGCTTGCCAGAAAGATATTGCAGAGAAGATATAAAAACAGA 12540 

G ATGTGATTATTTATTCGCTGTAAAAGGAAATAAGAGTCGGCTTAATAGAGTCTTTC 12600 

AGATATTTACGCTGAAAGAATTAAATAATCCAAAACATGACAGTTACGCAATTAGTGAAA 12660 

AGAGGCACGGCAGAGACGATGTCCGTCTTCATATTGTTTGAGATGCTCCTGATC^ 12720 

TTGATTTCACGTTTGAATGGAAAGGGCTGCAGAATTTATGAATGGCAGTC 12780 

C AATAATAGCAGAGCAAAAGAAAGAATCCGAAATGACGATCAAATATTATATTAGATCTG 12840 

CTGCTTTAACCGCAGAGAAGTTCGCCACAGTAAATCGAAATCACTGGCGCATGGAGAATA 12900 

AGTTGC AC AGTAGCCTGATGTGGTAATGAATGAAATCGACTAT AATATAAGAAGGCGAGT 12960 

TGCATTCGAATGATTTTCTAGAATGCGGCACATCGCTATTAATATCTGACAATGATAATC 13020 

TATTCAAGGCAGGATTATCATGTAAGATGCGAAAAGCAGTCATGGACAGAAACTTCC^^ 13080 

Ezid of the E- repeat 

CGTCAGGCATTGCAGCGTGCGGGCTTTCATAATCTTGCAT TGGTTTTGATAAGATATTTC 13 140 

Start o£ or£X2 

MNLYGIFGAGSYGRE 
TTTGGAGATGGGAAAMQAATTTGTATGGTATTTTTGGTC^ 13200 

TIPILNQQIKQECGSDYALV 
ACAATACCCATTCTAAATCAACAAATAAAGCAAGAATGTGGTTCTGACTATGCTCT^ 13260 

FVDDVLAGKKVNGFEVLSTN 
TTTGTGGATGATGTTTTCGCAGGAAAGAAAGTTAATGGTTTTGAAGTGC^^ 13320 

CFLKAPYLKKYFNVAIANDK 
TGCTTTCTAAAAGCCCCTTATTTAAAAAAGTATTTTAATGTTGCTATTC 13380 

IRQRVSESILLHGVEPITIK 
ATACGACAGAGAGTGTCTGAGTCAATATTATTAGACGGGGTTGAACCAATAACTATAAAA 13440 

HPNSVVYDHTMIGSGAI ISP 
CATCCT^TAGCGTTGTTTATGATaVTACTATGATAGGTAGTGGOK:™ 13500 

FVTISTNTHIGRF FHANIYS 
TTTGTTACAATATCTACrAATACTCATATAGGGAGGTTTTTT^ 13560 

YVAHDCQIGDYVTFAPGAKC 
TACGTTGCACATGATTGTCAAATAGGAGACTATGTTACATTTGC^ 13620 

NGYVVIEDNAYIGSGAVIKQ 
AATGGATATGTTGTTATT6AAGACAATGCATATATAGGCTCGGGTGCAGTAAOT 13680 

GVPNRPLIIGAGAIIGM6AV 
GGTGTTCCTAATCGCCCACTTATTATTGGCGCGGGAGCCATTATAGGTATGGGGGCTGTT 13740 

VTKSVPAGITVCGNPAREMK 
GTCACTAAAAGTGTTCCTGCCGGTATAACTGTCTGCGGAAATCCAGCAAGAGAAATGA^ 13800 

End o£ or£12 

R S P T S I * 
AGATCGCCAACATCTATTTAATGGGAATGCGAAAACACGTTCCAAATGGGACTAATGTTT 13860 
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AAAATATATATAATTTCGCTAATTTACTAAATTATGGCTTCTTTTTAAGCTATCCTTTAC 13920 
TTAGTTATTACTGATACAGCATGAAATTTATAATACTCTGATACATTTTTATACGTTATT 13980 
C AAGCCGCATATCTAGCGGTAACCCCTGACAGGAGTAAACAATG 14024 
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ATGGCACJU^GTCATTAATACCAACAGCCTCTCGCTGATCACTCAAAATAATATCAACAAG 

AACCAGTCTGCGCTGTCGAGTTCTATCGAGCGTCTGTCTTCTGGCTTGCGTATTAACAGC 

GCGAAGGATGACGCCGCGGGTa^GGCGATTGCTAACCGTTTTACTTCTAACATTAAAGGC 

CTGACTCAGGCTGCACGTAACGCCAACGACGGTATTTCTGTTGCACAGACCACTGAAGGC 

GCGCTGTCCGAAATCAACAACAACTTAGAGOSTATCCGTGAGCTGACGGTTCAM 

ACCGGGACTAACTCTGATTCGGATCTGGACTCCATTCAGGACGAAATCAAATCCCGTCT 

GACGAAATTGACCGCGTATCCGGTCAGACCCAGTTCy^CGGCGTGAACGTACTGGCAAAA 

GACGGTTCGATGAAAATTCAGGTAGGTGC6AACGACGGCCAGACTAT»CTATTGATCTG 

AAGAAAATTGACrrCTGATACGCTGGGGCTGAATGGTTTTAACGTGAATGGTTCCGGTACG 

ATAGCCAATAAAGCGGCGACCATTAGCGACCTGACAGCAGCGAAAATGGATGCTGCAACT 

AATACTATAACTACAACAAATAATGCGCTGACTGCATCyU^GGCCCTTGATCAACTGAAA 

GATGGTGACACTGTTACTATCAAAGCAGATGCAGCTCAAACTGCCACGGTCTATACATAC 

AATGCATCTGCTGGTAACTTCTCATTCAGTAATGTATCGAATAATACTTaVGCAAAAGC^ 

GGTGATGTAGCAGCTAGCCTTCTCCCGCCGGCTGGGCAAACrGCTAGTGGTGTTTACAAA 

GCAGCAAGCGGTGAAGTGAACTTTGATGTTGATGCGAATGGTAAAATTACAATCGGAGGA 

CAGGAAGCCTATTTAACTAGTGATGGTAACriTAACTACAAACGATGCTGGT^ 

GCGGCTACGCTTGATGGTTTATTGAAGAAAGCTGGTGATGGTCAATCAATC^^ 

AAGACTGCATCAGTCACGATGGGGGGAACAACTTATAA(nTTAAAACGGGTGC^ 

GGTGCTGCAACTGCTAACGCAGGGGTATCGTTCACTGATACAGCTAGCAAAGAAACCGT^ 

TTAAATAAAGTGGCTACTlGCTAAACAAGGCACAGCAGTTGCAGCTAACGGTGATAaiTC^ 

GCAACAATTACCTATAAATCTGGCGTTCAGACGTATCAGGCGGTATTTGCCGCAGGTC^ 

GGTACTGCTAGCGCAAAATATGCCGATAATACTGACGTTTCTAATGCAACAGC7VACATAC 

ACAGATGCTGATGGTGAAATGACTACAATTGGTTCATACyiCCACGAAGTATTCA^ 

GCTAACAACGGCAAGGTAACTGTTGATTCTGGAACTGGTTCGGGTAAATATGC^ 

GTCGGGGCTGAAGTATATGTTAGTGCTAATGGTACnTTAACAACAGATGCAACTAGCG^ 

GGCACAGTAACAAAAGATCCACTGAAAGCTCTGQATGAAGCTATCAGCTCCATCGACAAA 

TTCCGTTCATCCCTGGGGGCTATCCAAAACCGTTTGGATTCCGCCGTCACCAACCT^ 

AACACCACTACCAACCTGTCTGAAGCGCAGTCCCGTATTCAGGACGCCGACT^ 

GAAGTGTCCAACATGTCGAAAGCGCAGATTATCCAGCT^GGCCGGTAACTCCGTGCTGGC^ 

AAAGCCAACCAGGTACCGCAGCAGGTTCTGTCTCTACTGCAGGGTTAA 
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AACAAATCTCAGTCTTCTCTTAGCTCTGCTATT 

GAGCGTCTGTCTTCTGGTCTGCGTATTAACAGCGCAAAAGACGATGCAGCAGGTCAGGCG 

ATTGCTAACCGTTTTACGGCAAATATTAAAGGTCrcACCCAGGCTTCCCGTAACGCA^ 

GATGGTATTTCTGTTGCX5CAGACCACTGAAGGTGCGCTGAATGAAATTAACAACAACCT 

CAGCGTATTCGTGAACTTTCTGTTCAGGCAACTAACGGTACTAACTCTGACAGTG^^ 

ACCTCCATCCAGTCCGAAATCCAGCAGCGTCTGAGTGAAATTGACCGTGTTTCTGGTCAG 

ACTCAGTTTAACGGCGTTAAAGTGCTGGCTTCTGATCAGGATATGACTATTCAGGT^^ 

GCAAACGACGGCGAAAOUVTTACTATTAAACTGCAGGAAATTAATTCCGACACACTGGGA 

TTATCTGGTTTTGGTATTAAAGATCCTACrAAATTAAAAGCCGCAACGGCTGAAACA^ 

TATTTTGGATCGACAGTTAAGCTTGCTGACGCTAATACACTTGATGCA^^ 

ACAGTTAAAGGCACTACGACTCCGGGCCAACGTGACGGTAATATTATGTCTGATGCTAAC 

GGTAAGTTGTACGTTAAAGTT6CCGGTTCAGATAAACCCGCTGAAAATGGTTATTATGAA 

GTTACTOTGGAGGATGATCCGACATCTCCT^TGCAGGTAAGCTGAAGCTGGGGGCTCTA 

GCGGGTACCCAGCCraVAGCTGGTAATTTAAAGGAAGTCACAACGGTGAAAGGGA^ 

GCTATTGATGTTCAGTTGGGTACTGATACCGCAACCGCTTCTATCACAGGTGCAAAACT 

TTTAAGTTAGAAGACGCCAATGGCAAAGATACTGGTTCATTTGCGTTGA 

GGTAAACAGTATGCAGCGAATGTTGATCAGAAAACAGGAGCAGTTTCCGTTAAAAaUlT^ 

TCTTACACTGATGCTGACGGTGTCAAACACGACAATGTTAAAGTTGAACTGGGT^^ 

GATGGCAAAACCGAAGTTGTAACTGCAACCGATGGCAAAACTTACAGTGTTAGT^ 

CTAGGTAAGAGCCTGAAAACTGATTCTATTGCAGCiATTTCTACGCAGAA^ 

CCTTTGGCTGCTATCGATAAAGCACTGTCTCAGGTTGACTCGTTGCGTTCTAACCTAGGT 

GCAATTCAAAATCGTTTCGACTCTGCCATavCCAACCTTGGCAAC^ 

TCTTCTGCCCGTAGCCGTATCGAAGATGCTGACTACGCGACCGAAGTGTCTAACATGTCT 

CGTGCGCAGATCCTGCAACAAGCGGGTACCTCTGTTCTGGCGCAG 
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AACAAATCTCAGTCTTCTCTGAGCTCCGCCATTGAACGTCTCTCTTC 

TGGCCTGCGTATTAACAGTGCTAAAGATGACGCAGCAGGTCAGGCGATTGCrAACCGTTT 

TACAGCAAATATTAAAGGTCTGACTCAGGCTTCCCGTAACGCGAATGATGGTATTTCTGT 

TGCGCAGACCACTGAAGGTGCGCTTTCTGAAATCAACAATAACTTACAGCGTATTCGTGA 

ATTGTCAGTACAGGCCACTAATGGTACAAACTCTGACTCCGACCTGAATTCAATTCAGGA 

TGAAATTAOVCAACGCCTTAGTGAAATTGATCGTGTTTCTAACCAGACACAAl^ 

TGTAAAAGTTCTGGCTTCTGATCAGACTATGAAAATTCAAGTAGGTGCGAACGATGGTGA 

AACCATTGAGATTGCCCTTGATAAAATTGATGCTAAAACCTTGGGGCTTGAT^ 

CGTAGCACCAGGAAAAGTTCCAATGTCCTCTGCGGTTGCACTTAA6AGCGAAGCCGCTCC 

TGACTTTUVCTAAGGTAAATGCAACTGATGGTAGTGTGGGAGGTGCTAAAGCATTCGGTAG 

CAATTATAAAAATGCTCATGTTGAAACTTATTTTGGTACC^ 

GGATACAACTGATGCGACCGGTACTGCAGGAACAAAAGTTTATCAAGTACAGGTGGAAGG 

GCAGACTTATTTTGTTGGTCAAGATAATAATACCTACACGAACGGTTI^ 

ACAAAACTCTACAGGTTATGAAAAAGTTCAGGTGGGTGGTAAGGATGTT«GTTA« 

CTTTGGTGGTCGTGTAACTGCATTTGTTGAAGATAATGGTTCTGCCACATCA 

AGCTGCGGGTAAAATGGGTAAAGCATTAGCTTATAATGATGCACCJUITGTCTC 

TGGGGGAAAAAACCTAGATGTCCACCAAGTACAAGATACCCAAGGGAATCCTGTACCTAA 

TTCATTTGCTGCTAAAACATCAGACGGCACCTACATTGCAGTAAATGTAGATGC^ 

AGGTAACACGTCTGTTATTACTGATCCTAATGGTAAGGOVGTTGAATGGGCAGTAAAAA^ 

TGATGGTTCTGCACAGGCAATTATGCGTGAAGATGATAAGGTTTATACAGCCAATAT^ 

GAATAAGACGGCAACCAAAGGTGCTGAACTOVGTGCCrCAGATTTGAAAGCCTTAGC^ 

CACAAATCCATTATCCACATTAGACGAAGCTTTGGCAAAAGTTGATAAG 

TTTGGGTGCTIGTACTUVAACCGTTTCGACTCTGCCATCACCAACC^^ 

CAACCTGTCTTCTGCCOSTAGCCGTATAGAAGATGCTGACTACGCAACCGAAGTGTCTAA 

CATGTCTCGTGCGCAGATCCTGCAACAAGCGGGTACCTCTGTTCTGGCACAG 
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AACAAAAACCAGTCTGCGCTGTCGACTTCTATCGAG 

CGCCTCTCTTCTGGTCTGCGTATTAACAGCGCTAAAGATGACGCCGCGGGCCAGGCGATT 

GCTAACCGCTTTACTTCTAACATCAAAGGTCTGACTCAGGCCGCACGTAACGCCAACGAC 

GGTATTTCTCTGGCGCAGACGGCTGAAGGCGCGCTGTCAGAGATTAACAACAACTTG 

CGTATTCGTGAACTGACCGTTCAGGCCTCTACCGGCACGAACTCTGATTCCGACCTGTCT 

TCTATTOVGGACGAAATCAAATCCCGTCTTGATGAAATTGACCGTGTATCTGGTCAGACC 

CAGTTCAACGGTGTGAACGTGCTGTCGAAAAACGATTCGATGAAGATTCAGATT^ 

AATGATAACCAGACGATCAGCTVTTGGCTTGCAACAAATCGACAGTACCACT^ 

AAAGGATTTACCGTGTCCGGCATGGCGGATTTCAGCGCGGCGAAACTGACGGCTGC^^ 

GGTACAGCAATTGCn^CTGCGGATGTCAAGGATCCTGGGGGTAAACAAGTCAAT^ 

TCTTACACTGACACCGCGTCTAACAGTACrAAATATGCGGTCXSTTGATTCTGCAAC^ 

AAATACATGGAAGCCACTGTAGTCATTACCGGTACGGCGGCGGCGGTAACTCTTGGT^ 

GCGGAAGTGGCGGGAGCCGCTACAGCCGATCCGTTAAAAGCACTGGATGCCGCAATCGCT 

AAAGTCGACyUATTCCGCTCCTCCCTCGGTGCCGTTCAAAACCXSTCTGGATT 

ACCAACCTGAACAACACCACCACCAACCTGTCTGAAGCGCAGTCCCGTATTCAGGACGCC 

GACTATCCGACCGAAGTGTCCAACATGTCGAAAGCGCAGATTATCa^GCAGGOK^ TCCGTGCTGTCTAA 
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Figure 10 



AACAAAAACCAGTCTGCGCTGTCGACTTCTAT 

CGAGCGCCTCTCTTCTGGTCTGCGTATTAACAGCGCTAAAGATGACGCCGCGGGCCAGGC 

GATTGCTAACCGCTTCACTTCTAACATCAAAGGTCTGACTCAGGCCGCACGTAACGCCAA 

CGACGGTATCTCTCTGGCGCAGACCACTGAAGGCGCGCTGTCTGAAATCAACAACAACTT 

GCAGCGTGTGCGTGAGTTGACCGTTCAGGCGACGACCGGGACTAACTCI^TTCTGACCT 

GTCTTCTATTCAGGACGAAATCAAATCCCGTCTGGATGAAATTGATCGCGTTTCCGGTCA 

GACCCAGTTCAACGGCGTGAATGTGCTGGCGAAAGATGGTTCGATGAAGATTCAGGTTGG 

CGCGAATGATGGGCAGACTATTAGCATTGATTTGCAGAAGATTGACTCTTCTACATTAGG 

ACTGAACGGTTTCTCCGTTTCGGGTCAGTCACTTAACGTTAGTGATTCCATTACTCAAAT 

TACCGGTGCCGCCGGGACAAAACCTGTTGGTGTTGATTTCACTGCTGTTGCGAAAGATCT 

GACTACTGCGACAGGTAAAACAGTCGATGTTTCTAGCCTGACGTTACACAACACTCTGGA 

TGCGAAAGGGGCTGCTACATCACAGTTCGTCGTTCAATCCGGCAATGATTTCTACTCCGC 

GTCGATTAATCATACAGACGGCAAAGTCACGTTGAATAAAGCCGATGTCGAATACACAGA 

CACCGATAATGGACTAACGACTGCGGCTACTCAGAAAGATO^CTGATTAAAGTTGCCGC 

TGACTCTGACGGCTCGGCTGCGGGATATGTAACATTCCAAGGTAAAAACTACGCTACAAC 

GGTTTCAACGGCACTTGATGATAATACTGCGGCAAAAGCAACAGATAATAAAGTTGTTG 

TGAATTATCAACAGCAAAACCGACTGCACAGTTCrrCAGGGGCTTCTTCTGCTGATCCACT 

GGCACTTTTAGAO^GCTATTGCACAGGTTGATACTTTCCGCTCCTCCCTCGGTGCC^ 

GCAAAACCGTCTGGATTCCGCAGTAACCAACCTGAACAACACCACCACCAACCTGTCTGA 

AGCGCAGTCCCGTATTCAGGACGCCGACTATGCTACAGAAGTGTCCAACATGTCGAAAGC 

GCAGATCATCCAGCAGGCAGGTAACTCGGTGCTGTCCAAA 
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Figure 11 



ATGGCACAAGTCATTAATACCAACAGCCTCTCGC 

TGATCACTCAAAATAATATCAACAAGAACCAGTCTGCGCTGTCGAGTTCTATCGAGCGTC 
TGTCTTCTGGCTTGCGTATTAACAGCGCGAAGGATGACGCCGCGGGTCAGGCGATTGCTA 
ACCGTTTTACTTCTAACATTAAAGGCCTGACTCAGGCTGCACGTAACGCCAACGACGGTA 
TTTCTGTTGCGCAGAC»CCGAAGGCGCGCTGTCCGAAATTAACAACAACrTACAGCGTA 
TTCGTGAACTGACGGTTCAGGCTTCTACCGGGACTAACTCTGATTCGGATCTGGACTCCA 

ttca^ggacgaaatcaaatcccgtctcxsacgaaattgaccgcgtatccggtcagacccagt 

tcaacggcgtgaacgtactggcaaaagacggttcgatgaaaattcaggttggtgcgaatg 

acggccagactatcactattgatctx5aagaaaattgactctgatacgctggggctgaatg 

ggtttaatgtgaacggcaaaggggaaacggctaatacggcagcaaccctgaaagatatgt 

ctggattcacagctgcggcggcaccagggggaactgttggtgtaactcaatatact^ 

aatcggctgtagcaagtagcgtagatattctaaatgctgttgctcgcgcagatgga;^ 

AAGTTACAACTAGCGCCGATCTTGGTTTTGGTACACCAGCCGCTGCTGTAACCTATACCT 
ACAATAAAGACACTAATTCTITATTCCGCCGCTTCTGATGATATTTCCAGCGCTAACCT^ 

ctgctttcctcjultccra^ggccggagatacgactaaagctacyvgttacaat^ 
aagatcaagatgtaaacatcgataaatccggtaatttaactgctgctgatgatggcg 

TACTTTATATGGATGCTACCGGTAACTTAACTAAAAATAATGCTGGTGGTGATACAC^ 

CTACTTTGGCTAAACTTGCTACTGCTACTGGTGCTAAAGCCGCGACCATCCAAACTGAT^ 

AAGGAACATTCACCAGTGACGGTACAGOSTTTGATGGTGCATCAATGTCCa^TT^ 

ATACATTTGCAAATGCAGTAAAAAATGACACTTATACTGCCACrGTAGGTGCTA^ 

ATAGCGTAACAACAGGTTCTGCTGCTGCAGACACOJCTTATATGAGCAATGGGGTT^ 

GTGATACTCCGCCAACITACTATGCACAAGCTGATGGAAGTATCACAACTACT^ 

CGGCTGCCGGTAAACTGGTCTACAAAGGTTCCGATGGTAAGTTAACAACGGATACGACTA 

GCAAAGCAGAATCAACATCAGATCCGCTGGCAGCTCTTGACGACGCTATCAGCCAGATCG 

ACAAATTCCGCTCCTCCCTGGGTGCXSGTGaU^CCGTCrrGGATTCCGC^ 

TGAAOUICACCACTACCAACCTGTCTGAAGCGCAGTCCCGTATTCAGGACGCCGACTATO 

CGACCGAAGTGTCa^CATGTCX3AAAGCGCAGATTATCCaWK:AGGCCGGTAACTCra^ 

TGGCAAAAGCTAACCAGGTTCCXXyiGCAGGTTCTGTCTCTGCTGCAGGGTO 
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ATGGCACAAG TCATTAATAC CAACAGCCTC TCGCTGATCA CTCAAAATAA TATCAACAAG 
AACCAGTCTG CGCTGTCGAG TTCTATCGAG CGTCTGTCTT CTGGCTTGCG TATTAACAGC 
GCGAAGGATG ACGCCGCGGG TCAGGCGATT GCTAACCGTT TTACTTCTAA CATTAAAGGC 
CTGACTCAGG CTGCACGTAA CGCCAACGAC GGTATTTCTG TTGCACAGAC CACCGAAGGC 
GCGCTGTCTG AAATCAACAA CAACTTACAG CGTATCCGTG AGCTGACGGT TCAGGCTTCT 
ACCGGAACTA ACTCTGATTC GGATCTGGAC TCCATTCAGG ACGAAATCAA ATCCCGTCTT 
GATGAAATTG ACCGCGTATC CGGCCAGACC CAGTTCAACG GCGTGAACGT ACTGGCAAAA 
GACGGTTCGA TGAAAATTCA GGTTGGTGCG AATGACGGTG AAACTATCAC TATCGACCTG 
AAGAAAATCG ATTCTGATAC TCTGGGTCTG AATGGTTTTA ACGTAAATGG TAAAGGTACT 
ATTACCAACA AAGCTGCAAC GGTAAGTGAT TTAACTTCTG CTGGCGC6AA GTTAAACAC 
CACGACAGGT CTTTATGATC TGAAAACCGA AAATACCTTG TTAACTACCG ATGCTGCATT 
CGATAAATTA GGGAATGGCG ATAAAGTCAC CGTTGGCGGC GTAGATTATA CTTACAACGC 
TAAATCTGGT GATTTTACTA CCACCAAATC TACTGCTGGT ACGGGTGTAG ACGCCGCGGG 
GCAGGCTACT GATTCAGCTA AAAAACGTGA TGCGTTAGCT GCCACCCTTC ATGCTGATGT 
GGGTAAATCT GTTAATGGTT CTTACACCAC AAAAGATGGT ACTGTTTCTT TCGAAACGGA 
TTCAGCAGGT AATATCACCA TCGGTGGAAG CCAGGCATAC GTAGACGATG CAGGCAACTT 
GACGACTAAC AACGCTGGTA GCGCAGCTAA AGCTGATATG AAAGCGCTGC TTAAAGCCGC 
GAGCGAAGGT AGTGACGGTG CTTCTCTGAC ATTCAATGGC ACTGAATATA CTATCGCAAA 
AGCAACTCCT GCGACAACCT CTCCAGTAGC TCCGTTAATC CCTGGTGGGA TTACTTATCA 
GGCTACAGTG AGTAAAGATG TAGTATTGAG CGAAACCAAA GCGGCTGCCG CGACATCTTC 
AATTACCTTT AATTCCGGTG TACTGAGCAA AACTATTGGG TTTACCGCGG GTGAATCCAG 
TGATGCTGCG AAGTCTTATG TGGATGATAA AGGTGGTATT ACTAACGTTG CCGACTATAC 
AGTCTCTTAC AGCGTTAACA AGGATAACGG CTCTGTGACT GTTGCCGGGT ATGCTTCAGC 
GACTGATACC AATAAAGATT ATGCTCCAGC AATTGGTACT GCTGTAAATG TGAACTCCGC 
GGGTAAAATC ACTACTGAGA CTACCAGTGC TGGTTCTGCA ACGACCAACC CGCTTGCTGC 
CCTGGACGAC GCTATCAGCT CCATCGACAA ATTCCGTTCT TCCCTGGGTG CTATCCAGAA 
CCGTCTGGAT TCCGCAGTCA CCAACCTGAA CAACACCACT ACCAACCTGT CTGAAGCGCA 
GTCCCGTATT CAGGACGCCG ACTATGCGAC CGAAGTGTCC AACATGTCGA AAGCGCAGAT 
TATCCAGCAG GCCGGTAACT CCGTGCTGGC AAAAGCCAAC CAGGTACCGC AGCAGGTTCT 
GTCTCTGCTG CAGGGTTAA 
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AACAAATCTCAGTCTTCTCTTAGCTCTGCTA 

TTGAGCGTCTGTCTTCTGGTCTGCGTATTAACAGCGCAAAAGACGATGCAGCAGGTCAGG 

CGATTGCTAACCGTTTTACGGCAAATATTAAAGGTCTGACCCAGGCTTCCCGTAACGCAA 

ATGATGGTATTTCTGTTGCGCAGACCACTGAAGGTGCGCTGAATGAAATTAACAAC^ 

TGCAGCGTATTCGTGAACTTTCTGTTCAGGCAACTAACGGTACTAACTCTGACAGCGATC 

TTTCTTCTATCaVGGCTGAAATTACrCAACGTCrcGAAGAAATTC 

AAACTCAGTTTAACGGCGTGAAAGTCCTTGCTGAAAATAATCAAATGAAAATTCAGGTTG 

GTGCTAATGATGGTGAAACCATCACTATCAATOTGGCAAAAATTGATGCGAAAACTCT 

GCCn^ACGGTTTTAATATCGATGGCGCGCAGAAAGCAACAGGCAGTGACCTGATTTCTA 

AATTTAAAGCGACAGGTACTGATAATTATGATGTTGGCGGTAAAACTTATACCGTGAATG 

TGGAGAGCGGCGCXSGTTAAGAATGATGCTAATAAAGATGTTTTTGTAAGCGC^^ 

GATCGCTGACGACCAGTAGTGATACTAAAGTATCCGGTGAAAGTATTGATGCAACAGAAC 

TAGCGAAACTTGCAATAAAATTAGCTGACAAAGGCTCCATTGAATACAAGGGCATTACAT 

TTACTAACAACACTGGCGCAGAGCTTGATGCTAATGGTAAAGGTGTTT^ 

TTGATGGTCAAGATGTTOUVTTTACTATTGACAGTAATGCACCCAO^ 

CAATAACTACAGACACAGCTGTTTACAAAAAa^GTGCGGGCCAGTTC^^ 

TGGAAAATAAAGCCGCAACACTCTCTGATCTGGATCTTAATGCAGCCAAGAAAAC^^ 

GCACTTTAGTTGTAAATGGCGCCACCrrACAATGTCAGCGCAGATGGTAAAACGGT^ 

ATACTACTCCIX3GTGCCCCTAAAGTGATGTATCTGAGCAAATCAGAAGGTGGTAGC^ 

TTCTGGTAAACGAAGATGCAGCAAAATCGTTGCAATCTACCACCAACCa5^ 

TCGACAAGGCATTGGCTAAAGTTGACa^TCTGCGTTCTGACCTaSGTGCa^GTACa;^ 

GTTTCGACTOTGCCATCACCAACCTTGGC^CACCGTAAACAACCTGTC^ 

GCCGTATCGAAGATGCTGACTACGCGACCGAAGTGTCTAACATGTCTCGTGCGCAGATCC 

TGCAACAAGCGGGTACCTCTGTTCTGGCGCAG 
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ATGGCACAAGTCATTAATACCAACAGCCTCTCG 

CTGATCACTCAAAATAATATCAACAAGAACCAGTCTGCGCTGTCGAGTTCTATCGAGCGT 

CTGTCTTCTGGCTTGCGTATTAACAGCGCGAAGGATGACGCCGCGGGTCAGGCGATTGCT 

AACCGTTTTACTTCTAACATTAAAGGCCTGACTCAGGCTGCACGTAACGCCA^ 

ATTTCCGTTGCACAGACCACTGAAGGCGCGCTGTCCGAAATTAACAACAACTTACA^ 

ATTCGTGAACTGACGGTTCAGGCTTCTACCGGGACTAACrCCGATTCGGATCTGGACTCC 

ATTCAGGACGAAATCAAATCCCGTCTGGACGAAATTGACCGCGTATCCGGCCAGACCCAG 

TTCAACGGCGTGAACGTGCTGTCCyUU^GATGGCTCGATGAAAATTCAGGTCGGCGCGAAC 

GATGGCGAAACGATTACTATTGATCTGAAGAAAATTGACTCTGATACGCTGAATCTGGCT 

GGTTTTAACGTTAACGGTAAAGGTTCTGTAGCGAATACAGCTGCGACAAGCGACGA^ 

AAACTGGCTGGTTTCACTAAGGGCACCACAGATACCyVATGGCGTGACCGCGTATAC^ 

ACT^TTAGTAATGACAAAGCCAAAGCTTCCGATCTGTTAGCTAATAT^ 

GTGATCACTGGGGGAGGGGCAAACGCTTTTGGCGTGGCT6CAAAGAATGGTT 

GATGaiGCAAGTAAATCTTATAGTTTTGCTGCAGATGGTGCCGATTCAGCGAAGAC^ 

AGCATCATTAATCCAAACACCGGTGATTCGTCGCAGGOGACAGTGACTATTGGT^ 

GAGCAGAAAGTTAATATTTCCCAGGATGGAAAAATTACTGCGGCAGATGATAATGCGACG 

CTGTATTTAGATAAACAGGGAAACTTGACAAAAACGAATGCAGGTAACGATAC 

ACTTGGGATGGTTTAATTTCCAACAGCGATTCTACCGGTGCGGTTCCAGT^ 

ACTACAATTACAATTACTTCTGGTACAGCrTCCGGAATGTCTGTTC^^ 

GGAATTCAGACCTCAACAAATTCTCAGATTCTTGCAGGTGGTGCATT^ 

AGTATTGAGGGAGGCGCTGCTACAGACATTTTGGTAGCAAGTAATGGAAA^^ 

GCTGATGGTAGTGCACTTTATCTTGATCCGACTACTGGTGGATTCACT^ 

GGAAATACAGCTGCTTCGTTAGATAATTTAATTGCTAACAGTAAGGATGCTACCTT^ 

GTAACTTCAGGTACCGGCCAGAACACTGTTTATAGCACAACAGGAAGTGGCGCTCAGTTC 

ACCAGTTTAGCAAAAGTAGACACAGTCAATGTCACCAACGCACATGTCAGTGCCGAAGGT 

ATGGCAAATCTGACAAAAAGCAATTTTACCATTGATATGGGCGGTA^ 

TACy^CAGTTTCCM.TGGGGATGTGAAAGCTGCTGCAAATGCTGATGl^ 

GGTGCACTTTCAGCCAATGCTAOUUUlGATGTAACCTACTTTGAACAAAAAAAT^^ 

ATTACCAAOVGCACCGGTGGTACCATCrrATGAAACAGCTGATGGTAAGTTAACAACAG;^ 

GCTACrrACrGCATCCAGTTCCACCGCCGATCCCCTGAAAGCTCTGGACGAAGC» 

TCCATCGACTkAATTCCGCTCCTCCCTCGGTGCGGTGCAAAACCGTCI^ 

ACCAACCTGAACAACACCACTACCAACCTGTCCGAAGCGCAGTCCCGTATO 

GACTATGCGACOSAAGTGTCCAACATGTCGAAAGCGCAGATCATCCAGCAGGCCG 

TCCGTGCTGGCAAAAGCTAACCAGGTACCGCAGCAGGTTCTGTCTCTGCTGCAGGGTTAA 
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ATGGCACAAGTCATTAATACCAACAGC 

CTCTCGCTGATCACTCAAAATAATATC7ACAAGAACCAGTCTGCGCTGTCGAGTTCTATC 

GAGCGTCTGTCTTCTGGCTTGCGTATTAACAGCGCGAAGGATGACGCCGCGGGTCAGGCG 

ATTGCTAACCGTTTTACTTCTAACATTAAAGGCCTGACTCAGGCTGCACGTAACGCC^ 

GACGGTAlT'TCTGTTGCGCAGACCACCGAAGGaSCGCTGTCCGAAATTAACAACA^ 

CAGCGTGTGCGTGAGCTGACTCTTCAGGCGACCTlCCGGTACTAACTCTCAGTCTGACCr^ 

TCTTCTATCCAGGACGAAATOU^TCTCGCCTGGAAGAGATTGATCGTGTTTCAAGTCAG 

ACTCAATTTAACGGCGTGAATGTTTTGGCTAAAGATGGGAAT^TGAACATTCAGGTTGGG 

GGAAATGATGGACAGACTATCACTATTGATCrGAAAAAGATCGATTCATCTACACT;^ 

CTCTCCAGTTTTGATGCTACAAACTTGGGCACCAGTGTTAAAGATGGGGCCAC^ 

AAGCAAGTGGCAGTAGGTGCTGGCGACTTTAAAGATAAAGCTTCAGGATCGTTAGGTACC 

CTAAAATTAGTTGAGAAAGACGGTAAGTACTATGTAAATGACACTAAAAGTAGTAAGTAC 

TACGATGCCGAAGTAGATACTAGTAAGGGTAAAATTAACTTCAACrCTACAAATGAAAGT 

GGAACTACTCCTACTGCAGCGACGGAAGTAACTACrK3TTGGCaK:GATGTAAAATTGGAT 

GCTTCIXSCACTTAAAGCCAACCAATCGCTTGTCGTGTATAAAGATAAAAGCTO 

GCITATATCATTCAGACCAAAGATGTAACAACTAATCAATCAACITTC^ 

ATCAGTGATGCTGGTGTTTTATCTATTGGTGCATCTACAACCGCGCCAAGC^ 

GCTAACCCGCTTAAGGCTCTTGATGATGCAATTGCATCrcTTGATAAATTCCGCTCTTCT 

CTCGGTGCCGTTCAGAACCGTCTGGATTCTGCCATTGCOVACCTGAAa^ 

AACCTGTCTGAAGCGCAGTCCOSTATTCAGGACGCTGACTATGCGACCGAAGTGTCCAAC 

ATGTCGAAAGCGCAGATTATCCAGCAGGCCGGTAACTCCGTGCTGGCAAAAGCCAACCAG 

GTACCGCAGCAGGTTCTGTCTCTGCTGCAGGGTTAA 



Figure 16 



S:VP30384\HG7-69 



AACAAATCTCAGTCTTCTCTGAGCTCCGCCAT 

TGAACGTCTCTCTTCTGGCCTGCGTATTAACAGTGCTAAAGATGACGCAGCAGGTCAGGC 

GATTGCTAACCGTTTTACAGCAAATATTAAAGGTCTGACTCAGGCTTCCCGTAACGCGAA 

TGATGGTATTTCTGTTGCGCAGACCACTGAAGGTGCGCTGAATGAAATTAAC^CAACCT 

GCAGCGTGTACGTGAACTGACTGTTCAGGCAACTAACGGTACTAACTCTGACAGCGATCT 

TTCTTCTATCCAGGCTGAAATTACTCAACGTCTGGAAGAAATTGACCGTGTAT^ 

AACTCAGTTTAACGGCGTGAAAGTCCTTGCTGAAAATAATGAAATGAAAATTCAGGTTGG 

TGCTAATGATGGTGAAACCATCy^CTATCAATCTGGCAAAAAITGATGCGAAAACTCTCGG 

CCTGGACGGTTTTAATATCGATGGCGCGCAGAAAGCAACTGGCAGTGACCTGATTTCT 

ATTTAAAGCGACTIGGTACTGATAACTATGATGTTGGCGGTGATGC]^ 

AGATAGCGGAGCTGTTAAAGATACTACAGGGAATGATATTTTTGTTAGTGCAGCAGATGG 

TTCACTGACAACTAAATCTGACAaVAACATAGCTGGTACAGGGATTGATGCTAC^^ 

CGCAGCAGCGGCTAAGAATAAAGCACAGAATGATAAATTCACGTTTAATGGAGTTGAATT 

CACAACAACAACTGCy^GOSGATGGCAATGGGAATGGTGTATATTC^^ 

TAAGTCAGTGACATTTACn^TGACAGATGCTGACAAAAAAGCTTC^^ 

GACAGTTTACAAAAATAGCGCTGGCCTTTATACGACTUVCa^ 

CACACTTTCCXSATCTTGATCTCAATGCAGCTAAGAAAACAGGAAGCACGTTAGT^ 

CGGTGCT^CTTACGATGTTAGTGCAGATGGTAAAACGATAACGQAGACTGCTTC^ 

CAATAAAGTCATGTATCTGAGCT^TCAGAAGGTGGTAGCCCGATTCTGGTAAACGAA^ 

TGCAGCAAAATCGTTGCAATCTACCACCAACCCGCTCGAAACTATOSACAAAGCA 

TAAAGTTGACAATCTGCGTTCTGACCTCGGTGCAGTACAAAACCGTTTCGACTC^ 

CACCAACCTTGGCAACACCGTAAACAACCrGTCTTCTGCCCGTAGCCGTATCGAAGATC 

TGACTACGCGACCGAAGTGTCTAACATGTCTCGTGCGCAGATCCTGCAACAAGCGGGTAC 

CTCTGTTCTGGCGCAG 
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ATGGCACAAGTCATTAATACCAACAGCCTCTCGCTGATCA 

CTCAAAATAATATCAACAAGAACCAGTCTGCGCTGTCGAGTTCTATCGAGCGTCTGTCTT 

CTGGCTTGCGTATTAACAGCGCGAAGGATGACGCAGCGGGTCAGGCGATTGCTAACCGTT 

TCACCTCTAACATTAAAGGCCTGACTCAGGCGGCCCGTAACGCCAACGACGGTATCTCCG 

TTGCGCAGACCACCGAAGGCGCGCTGTCCGAAATCAAOU^CAACTTACAGCGTATCCGTG 

AACTGACGGTTCAGGCTTCTACCGGGACTAACTCCGATTCGGATCTGGACTCCATTCAGG 

ACGAAATCAAATCCCGTCTGGACGAAATTGACCGCGTATCTGGCCAGACCCAGTTCAACG 

GCGTGAACGTACTGGCGAAAGACGGTTCAATGAAAATTCAGGTTGGTGCGAATGACGGCC 

AGACTATCACGATTGATCTXSAAGAAAATTGACTCAGATACGCTGGGGCTGAATGGT^ 

ACGTGAATGGTTCCGGTACGATAGCCAATAAAGCGGOSACCATTAGCGACCTGACAGCAG 

CGAAAATGGATGOTGCAACTAATACTATAACTACAACT^TAATGCGCrro 

AGGCGCTTGATCAACTGAAAGATGGTGACACTGTTACTATCAAAGCAGATGCTGCTC;^ 

CTGCCACGGTTTATACATACTATGCATCAGCTGGTAACTTCTCATTCAGTAATGT^^ 

ATAATACTTCAGCTU^GCAGGTGATGTAGCAGCTAGCCTTCTCCCGCCGGCTGGGCAAA 

CTGCTAGTGGTGTTTATAAAGCAGCyAGaSGTGAAGTGAACr^^ 

GTAAAATCTlCAATCGGAGGACAGAAAGaVTATTTAACTAGTGATGGTAACTTAA 

ACGATGCTGGTGGTGCGACTGCGGCTACXSCITGATGGTTTATTC^^ 

GTCAATC7ATCGGGTTTAAGAAGACTGCATCAGTCACGATGGGGGGAACAACTTATAACT 

TTAAAACGGGTGCTGATGCTGATGCrGCAACTGCTAACGCAGGGGTATCGTTCAC^ 

CAGCTAGCAAAGAAACCGTTTTAAATAAAGTGGCTA^GOTAAACAAGGC^^ 

CAGCTGACGGTGATACATCCGCAACAATTACCTATAAATCTGGCGTTCAGACGTATCAGG 

CTGTATTTGCaSCAGGTGACGGTACTGCTAGCXKIAAAATATGCC^ 

CTAATGCAACAGCAAO^TACACTGATGCTGATGGTGAAATGACTACAATTC^ 

CCACGAAGTATTCAATCGATGCTAACAACGGCAAGGTAACTGTTGATTCTGGAACTGGTA 

CGGGTAAATATGCGCCGAAAGTAGGGGCTGAAGTATATGTTAGTGCTAATGGTACTTTAA 

CAACAGATGCMCTAGCGAAGGCACAGTAACAAAAGATCCACTGAAAGCTCTGGAT^ 

CTATCAGCTCCATCGACAAATTCCGTTCTTCCCTGGGTGCTATCCAGAACCGTCTGGATT 

CCGCAGTCACCAACCTGAACAACACCACTACCAACOTGTCCGAAGCGCAGTCCOT^ 

AGGACGCCGACTATGCGACCGAAGTGTCCAACATGTCGAAAGCGCAGATCATTCAGCAGG 

CCGGTAACTCCGTGCTGGCAAAAGCCAACOVGGTACCGCy^GCAGGTTCTGTCTCTGCTC^ AGGGTTAA 
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ATGGCACAAGTCATTAATACCAACAGCCTCTCGCTGATCACTCAAAATA 

ATATCAACAAGTVACCAGTCTGCGCTGTCGAGTTCTATCGAGCGTCTGTCTTCTGGCTTGC 

GTATTAACAGCGCGAAGGATGACGCCGCAGGTCAGGCGATTGCTAACCGTTTTACTTCTA 

ACATTAAAGGCCTGACTCAGGCTGCTICGTAACGCCAACGACGGTATTTCCGTTGCGC 

CCACTGAAGGTGCGCTGTCCGAAATCAACAACAACTTACAGCGTATTCGTGAGCTGACGG 

TTCAGGCTTCTACCGGGACTAACTCCGATTCTGACCTGGACTCCATCCAGGACGAAATCA 

AGTCTCGTCTGGACGAAATTGACCGCGTATCCGGTCAGACCCAGTTCAACGGCGTGAACG 

TGCTGGOSAAAGACGGTTCGATGAAAATTCAGGTTGGTGCGAATGACGGCCAGACTATCA 

CGATTGATCTGAAGAAAATTGACrCAGATACGCTGGGGCTGAGTGGGTTTAATGTGAATG 

GTGGCGGGGCTGTTGCTAACACrKXn*GCATCTAAAGCTGACTTGGTAGCTGCTAATGC^ 

CTGTGGTAGGCAACAAATATACTGTGAGTGCGGGTTACGATGCTGCTAAAGCGTCTGATT 

TGCTGGCTGGAGTTAGTGATGGTGATACTGTTCAGGCAACCATTAATAACGGCTTCGGAA 

CGGCGGCTAGTGCAACGAATTACAAGTATGACAGTGCAAGTAAGTCTTACTCTTT^ 

CCACAACGGCTTCAGCTGCCGATGTTOIGAAATATTTGACCCCGGGCGTTGGTGATACCG 

CTAAGGGCACTATTACTATCGATGGTTCTGCACAGGATGTTCAGATCAGCAGTGAT^ 

AAATTACGTCAAGCAATGGAGATAAACITOACATTGATACAACTGG 

ACGGCTTTAGTGCTTCTTTGACTGAGGCTAGTCTGTCCACACTTGCAGCCAATAAT^ 

AAGCGACAACCATTGACyVTTGGCGGTACCTCTATCTCCTTTACCGGTAATAGTACTAC^ 

CGAA»CTATTACTTATTCAGTAACAGGTGCyuUU«3TTGATCaVGGCAGC^ 

CnSTATCAACCTCTGGAAACGATGTTGATTTCACTACCGCAGGTTATAGCGTCGACGGCG 

CAACTGGCGCTGTAACAAAAGGTGTTGCTCCGGTTTATATTGATAACAACGGGGCGTTG^ 

CCyVCATCrrGATACTGTAGATTTTTATCTACAGGATGATGGTTCAGTGA(^ 

GTAAGGCAGTTTATAAAGATGCTGACGGTAAATTGACGACAGATGCTGAAACTAAAGCTG 

CAACCACCGCCGATCCCCTGAAAGCTCrGGACGAAGCCATCyVGCTCCATCGACAAATTCC 

GCTCCTCCCTCGGTGCGGTGCAGAACCGTCrGGATTCCGCGGTCACCAACCTGAACAACA 

CCACTACCAACCTGTCTGAAGCGCAGTCCCGTATTCAGGACGCTGACTATGCGACCGAAG 

TATCCAACATGTCGAAAGCGCAGATCATCCAGCAGGCCGGTAACTCCGTGCTGGa^AAAG 

CTAACCAGGTACCACAGCAGGTTCTGTCTCTGCTGCAGGGTTAA 
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ATGGCACAAGTCATTAATACCAACAGC 

CTCTCGCTGATCACTCAAAATAATATCAACAAGAACCAGTCI^GCTGTCGAGTTCTATC 

GAGCGTCTGTCTTCTGGCTTGCGTATTAACAGCGCGAAGGATGACGCCGCAGGTCAGGCG 

ATTGCTAACCGTTTTACTTCTAACATTAAAGGCCTGACTCAGGCTGCACGTAACGCC^ 

GACGGTATTTCTGTTGCACAGACCACTGAAGGCGCGCTGTCCGAAATCAACAACAACTTA 

CAGCGTGTGCGTGAACTGACCGTTCAGGCAACCACCGGTACCAACTCCCAGTCTGACCTG 

GACTCTATCCAGGACGAAATTAAATCCCGTCK3GACX3AAATTGATCGCGTATCCGGT^^ 

ACCCAGTTCAACGGCGTGAACGTGCTGGCAAAAGACGGTTCCATGAAAATTCAGG^ 

GCGAACGATGGCCAGACCATCACTATCGACCTGAAGAAGATTGACTCTTCTACCTTGA^^ 

CTGACAGGTTTTAACGTTAACGGTTCTGGTTCTGTGGOSAATAC^ 

GATTTAACCGCTGCTCTVACTCTCTGCACCGGGTGCAGCAGACGa^TGGTAC^^ 

TATACTGTCAGTGCTGGTTATAAAGAATCCACTGCTGCAGATGTTATTGCTAGC^^ 

GACGGCAGTGCTCCGACTTCTGCAATTACTGCAACCATTAATAATGGCTTCGGTGATTCC 

AGTGCGCTGACriTCCAATGACTATACITATGACCCAGCAAAAGGC^ 

GTAGCTTCAAGCGCCAATAATACrcCTGCCaiGGTTCAGTCCTTCCTG^ 

GGTGATACCGCAAATCrGAAAGTAACCXSTTGGTACGAOVTCGGTTGATGTCGT^ 

AGTGATGGTAAGATTACAGCAAAAGATGGTTCTGCATTATATATCGACAGTACAGGTAAC 

CTGACTCAGAACAGTGCTGGCTTGACCTCTGCTAAACTGGCTACTCTGACr^^ 

GGCTCTGGTGTTGCTTCAACCATCACTACroAAGATGGCaCT 

AACGGTAATATTGGTCTGACCGGTGTTCGTATCAGTGCTGATTCTCTGCAGTCAGCGACT 

AJUVTCTACGGGCTTTACTGTTGGTACTGGCGCTACAGGTCT^^ 

AAAGTGACTATCGGCGGGACTACTGCTCAGTCCTACACCAGCAAAGATGGT^^ 

ACTGATAACACCACTAAACTGTATCTGCAGAAAGATGGCTCTGTAACCAAC^ 

AAAGCGGTCTATGTAGAAGCGGATGGTGATTTCaiCTACCGACGCTGCAACC^ 

ACCACCACCGATCCXSCTGAAAGCCCTGGATGAGGCAATCAGCaVGATCGATAAGTTCC^ 

TCATCCCTGGGTGCTATCCAGAACCGTCTGGATTCCGCGGTCACCAACCTGAACAACACC 

ACTACCAACCTGTCTGAAGCGCAGTCCCGTATTCAGGACGCCGACTATGCGACCGAAGTG 

TCCAACATGTCGAAAGCGCAGATCATTCAGCAGGCCGGTAACTCan'GCTGG 

AACCAGGTACCGCAACAGGTTCTGTCTCTGCTGCAGGGCTAA 
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ATGGCACAAGTCATTAATACCAACAGCCTCTCGCTGATCAC 

TCAAAATAATATCAACAAGAACCAGTCTGCGCTGTCGAGTTCTATCGAGCGTCTGTCTTC 

TGGCTTGCGTATTAACAGCGCGAAGGATGACGCCGCA6GTCAGGCGATTGCTAACCGTTT 

TACITCTAACATTAAAGGCCrGACTCAGGCTGCy^CGTAACGCCAACGATC 

TGCACAGACGACTCAAGGCGaSCTGTCCGAAATCAACTACAACTTAC^ 

ACTGACCGTTCAGGCAACCACCGGTACCAACTCCCAGTCTGACCTGGACTCTATCCAGGA 

CGAAATTAAATCCCGTCTGGACGAAATTGATCGCGTATCCGGTCAGACCCAGTTCAACGG 

CGTGAACGTGCTGGCAAAAGACGGTTCCyiTGAAAATTCAGGTTGGCGCGAACGATGG^ 

GACCATCACTATCGACCTGAAGAAGATTGACTCTTCTACCTTGAACCTGACAGGTTT^ 

CGTTAACGGTTCTGGTTCTGTGGCGAATACTGCAGCAACTAAAGCTGATTTAACCGC^ 

TCAACTCTCTGCACCGGGTGCAGCAGACGa^TGGTACAGTTACTTATACTGTCAGTGC 

TGGTTATAAAGAATCCACTGCTGCAGATGTTATTGCTAGCATCAAAGACGGCAGTGCTCC 

GACTTCTGCAATTACTGCAACCATTAATAATGGCTTCGGTGATTCCAGTGCGCTGACTO 

CAATGACTATACTTATGACCCAGCAAAAGGCGACTTCACTrACGACGTAGCCT 

CAATAATACTGCTGCCCAGGTTCAGTCCTTCCTGACGCCGAAAGCAGGTGATACC^ 

TCTGAAAGTAACCGTTGGTACGACATOSGTTGATGTCGTTCTGGCCAGTGAT^ 

TACAGCTVAAAGATGGTTCKKlATTATATATCGACAGTAa^GGTA^ 

TGCTGGCTTGACCTCTGCTATVACTGGCTACTCTGACTGGCCnTCAGGGCTC^^ 

TTCAACCATCACTACTGAAGATGGCACTAATATTGATATTGCTGCTA^ 

TCTGACCGGTGTTCGTATCAGTGCTGATTCrCTGCAGTCAGCGACTAAATCTACGGGCTT 

TACTGTTGGTACTGGCGCTACAGGTCTGACCGTAGGTACTGATGGTAAAGTGACTATCGG 

CGGGACTACTGCTCAGTCCTACACCAGCAAAGATGGTTCCCTGACTACTGATAACACC^^ 

TAAACTGTATCTGCAGAAAGATGGCTCTGTAACCAACGGTTCAGGTAAAGCGGTCTATG^ 

AGAAGCGGATGGTGATTTCACTACCGACGCTGCAACCAAAGCCGCAACCACCACCGATCC 

GCTGAAAGCCCTGGATGAGGCAATCAGCCAGATCGATAAGTTCCGTTCATCCCTGGGTGC 

TATCCAGAACCGTCTGGATTCCXXrGGTCACCAACCTGAACAACACaVCTACC^ 

TGAAGCGCAGTCCCGTATTCAGGACGCCGACTATGCGACCGAAGTGTCCAACATGTCGAA 

AGCGCAGATCATTCAGCAGGCCGGTAACTCCGTGCTGGCAAAAGCCAACCAGGTACCGCA 

ACAGGTTCTGTCTCTGCTGCA6GGCTAA 
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GCGCTGTCGACTTCTATCGAGCGCCTCTCTTCTGGTCTGCGTATTAACAGCGCTAAA 

GATGACGCTGCGGGCCAGGCGATTGCTAACCGCTTCACIT'CTAACATCAAAGGTCTGACT 

CAGGCCGCACGTAACGCCAACGACGGTATTTCTCTGGCGCAGACGGCTGAAGGCGCGCTG 

TCAGAGATTAACAACAACTTGCAGCGTATTCGTGAACTGACCGTTCAGGCCTCT^ 

ACGAACrrCTGATTCCGACCTGTCTTCTATTCAGGACGAAATCAAATCCCGTCTTGATGAA 

ATTGACCGTGTATCTGGTCAGACCCAGTTCAACGGTGTGAACGTGCTGTCGAAAAACGAT 

TCGATGAAGATTCAGATTGGTGCCAATGATAACCAGACGATCAGCyVTTGGCTTGC^ 

ATCGACAGTACavCTTTGAATCTGAAAGGATTTACCGTGTCCGGCATGGCGGATTTCAGC 

GCGGCGAAACTGACGGCTGCTGATGGTACAGCAATTGCTGCTGCGGATGTO^GGA^ 

GGGGGTAAACAAGTCTUITTTACTGTCTTACACTGACACCGCGTCTAACAGT^ 

GCGGTCGTTGATTCTGCAACCGGTAAATACATGGCAGCOVCTGTAGTCATTAC«^ 

GCGGCGGCGGTAACTGTTGGTGCAACGGAAGTGGOSGGAGCCGCTACAGCCGAACCGTTA 

AAAGCACTGGATGCCGCAATCGCTAAAGTCGACAAATTCCGCTCCTCCCTCGGTGCCGTT 

CAAAACCGTCTGGATTCTGCGGTCACCAACCTGAACAACACCAC»C^ 

GCGCAGTCCCGTATTOVGGACGCCGACTATGCGACCGAAGTGTCC/VACATGTCGAAAGCG 

CAGATTATCCAGCAGGCG 
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ATGGCACAAGTCATTAATACCAACAGCCTCTCGCTGATCACTCAAAATA 

ATATCAACAAGAACCAGTCTGCGCTGTCGAGTTCTATCGAGCGTCTGTCTTCTGGCTTGC 

GTATTAACAGCGCGAAGGATGACGCCGCAGGTCAGGCGATT6CTAACCGTTTTACTTCTA 

ATATTAAAGGCCTGACrCAGGCTGCACGTAACGCCAATGACGGTATTTCTGTTGCACAGA 

CCACTGAAGGCGCGCTGTCCGAAATCAACAACAACTTACAGCGTATTCGTGAACTGA^^^ 

TTCAGGCCACTACAGGGACTAACTCCGATTCTGACCTGGACTCCATCCAGGACGAAATCA 

AATCTCGTCTGGACGAAATTGACCGCGTATCCGGTCAGACCCAGTTCAACGGCGTGAACG 

TGCTGTCCAAAGATGGTTCAATGAAAATTCAGGTCXSGCGCAAATGATGGTGAAACCATav 

CGATTGATCTGAA6AAAATTGACTCTGATACGCTGAATCTGGCTGGTTTTAACGTO 

GCGAAGGTGAAACAGCCAATACTGCTGCAACACrTAAAGATATGGTTGGTTTAAAACT 

ATAATACGGGGGTCACTACAGCTGGAGTTAATAGATATATTGCTGACAAAGCCGTCGCAA 

GTAGCACGGATATTTTGAATGCGGTAGCTGGTGTTGATGGCAGTAAAGTTTC 

CAGATGTTGGTTTTGGTGCAGCTGCCCCIXSGTACGCCAGTGGAATATACTO^ 

ATACTAACACATATACGGCTTCTGCTTCAGTTGATGCGACTCAACTGGCGGCATTCC^ 

ATCCTGAAGCGGGTGGTACCACTGCTGOUICAGTAAGTATTGGCAACGGTACAA^^ 

AAGAGCAAAAAGTCATTATTCCTAAAGATGGTTCTTTAACTGCTGCT^ 

CTCTCTATCTTGATGATACTGGTAACTTAAGTAAAACTAACGCAGGCACTGATACTCAAG 

CTAAACTGTCTGACTTAATGGCAAACAATGCTAATGCCAAAACAGTCATTAC^ 

AAGGTACATTTACTGCTAATACGACAAAGTTTGATGGGGTAGATATTTCTGTTC 

CAACGTTTGCTAACGCCGTTAAAAATGAGACTTACACTGCAACro 

CTGCGACATATACAGTCAATAATGGCTVCTGCTGCATCAGCGTATTTAGTCGATGGAAA^ 

TGAGCAAAACTCCTGCCGAGTATTTTGCTCAAGCTGATGGCACTATTACTAGT^ 

ATGCGGCTACCAGTAAAGCTATCTATGTAAGTGCCAATGGTAACTTAACGACTAATACAA 

CTAGTGAATCTGAAGCTACTACCAACCCGCrGGCAGCATTGGATGACGCTATCGCGTCTA 

TCGACAAATTCCGTTCITCCCrGGGTGCTATCCAGAACCGTCTGGATTCCGCAGT^ 

ACCTGAACAAO^CCACTACCAACCrcTCTGAAGCGmGTCCCGTATTCAGGAC 

ATGCGACCGAAGTGTCCAACATGTCGAAAGCGCAGATCATTCAGCAGGCCGGTAACrrCCG 

TGCTGGCAAAAGCCAACCAGGTACCGCAGCAGGTTCTGTCTCTGCTGCAGGGTTAA 
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ATGGCACAAGTCATTAATACCAACAGCCTCTCGCTGATCACTCAAAATAATAT 

CAACAAGAACCAGTCTGCGCTGTCGAGTTCTATCGAGCGTCTGTCTTCTGGCTTGCGTAT 

TAACAGCGCGAAGGATGACGCCGCAGGTCAGGCGATTGCTAACCGTTTTACTTCTAACAT 

TAAAGGCCTGACTGAGGCTGCACGTAACGCCAACGAOSGTATTTCTGTTGCGCAGACCAC 

TGAAGGCGCGCTGTCCGAAATTAAOU^CAACTTACAGCGTATTCGTGAACTGACGGTTCA 

GGCGACGACCGGAACTAACTCCACCTCTGACCTGGACTCCATCCAGGACGAAATCA^ 

CCGTCTTGACGAAATTGACCGCGTATCTGGTCAGACCCAGTTCAACGGCGTGAACGTGCT 

GTCTAAAGATGGCTCGATGAAAATTCAGGTCGGCGCGAACGATGGCGAAACGATTACTAT 

TGATCTGAAGAAAATTGACTCTGATACGCTGAATCTGGCTGGTTTTAACGTTAACGGTAA 

AGGTTCTGTAGCGAATACCGCTGCGACTACAGATAATCTGACATTGGCTGGTTTTACAGC 

GGGTACTAAAGCTGCTGATGGO^CCGTAACTTATAGCAAAAATGTCCAGTTTGCCGCCGC 

GACTGCAAGCAATGTACTGGCTGCTGCTAAAGATGGCGACGAAATTACGTTCGCTGGTAA 

TAACGGCACyiGGTATAGCTGCAACTGGGGGGACTTATACTTATCATAAGGACTCTA^ 

ATACAGCTTTAGCGCM.CGGCTGCATCTAAAGATTCTCTGTTGAGCACACr^ 

CGCTGGCGATACATTTACCGCTAAAGTGACTATTGGTTCTAAATCGCAAGAAGTTAACGT 

TAGCAAAGATGGTACGATTACATCC7UX:GATGGTAAGGCGCTGTAT^ 

Cyy^CCTGACCGAAACAGGTAGTGGOVaU^CCAAAGCTGCAACCTGGGATA^ 

CAATACAGATACTACAGGCAAAGATGCCTATGGTAACTCTGCGGCAGCAGCTGrrGGGAC 

AGTAATCGAAGCAAAAGGAATGACCATCACTTCrcCTGGTGGTAATGCTa^GGTGCT 

AGACGCX^CTTATAATGCCGCATATGOGACCTCAATTACTAOTGGTACTCCGGGT^ 

GGGAGCCGCGGGAGCCGCTGCAACTGCGGGTAATGCCGCGGTGGGAGCGCTGGGOG^ 

GGCAGTTGATAATACCACGGCAGATGTTGCCGATATCTCTATCTCAGCrTCGCAAAT^ 

GAGCATCCTTCAGGATAAAGATTTCACCTTAAGTGATGGTAGTGATACrTACAACGT^ 

CAGCAATGCTGTCACTATCAATGGCAAAGCAGCAAACATTGATGACAGCGGOT 

AGACC^AACa^GTAAAGTTGTCAATTATTTCGCTCATACTAACGGTAGCGTGACTAACGA 

TACAGGCTCCACTATTTATGCGACAGAAGATGGTAGCCT6ACCACCGATGCAGCAACCAA 

AGCCGAAACCACCGCCGATCCCCTGAAAGCrrCrrGGACGAAGCCATCAGCTCCATCGACAA 

ATTCCGCTCCTCCCTCGGT6CGGTGCAAAACCGTCTGGATTCCGCX3GTCACCA^ 

CAACACCACCACC7ACCTGTCTGAA6CGCAGTCCCGTATTCAGGACGCCGACTATGCGAC 

CGAAGTGTCCAACATGTCGAAAGCGCAGATTATCCAGCAGGCCGGTAACTCCGTGCTGGC 

AAAAGCTAACCAGGTACCy^OlGCAGGTTCroTCTCTGCrrcCAGGGTTAA 
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ATGGCACAAGTCATTAATACCAACAGCCTCTCGCTG 

ATCyVCTCAAAATAATATCAACAAGAACCAGTCTGCGCTGTCGAGTTCTATCGAGCGTCTG 

TCTTCTGGCTTGCGTATTAACAGCGCGAAGGATGACGCCGCAGGTCAGGCGATTGCTAAC 

CGTTTTACTTCTAACATTAAAGGCCTGACTCAGGCGGCCCGTAACGCCAACGACGGTATT 

TCTGTTGCGCAGACCACCGAAGGCGCGCTGTCCGAAATTAACAACAACTTACAGCGTGTG 

CGTGAGCTGACTGTTCAGGCGACCACCGGTACCAACTCCCAGTCTGATCTGGACTCTATC 

CAGGACGAAATCy^TCCCGTCTGGACGAAATTGACCGCGTATCCGGTCAGACCCAGT^ 

AACGGCGTGAACGTGCTGGCAAAAGACGGTTCCATGAAAATTCTIGGTTGGCGCGAATGAT 

GGCCAGACCATCACTATCGACCTGAAGAAGATTGACTCTTCTACGTTGAAACI^ 

TTTAACGTGAATGGTTCTGGTTCTGTGGCGAATACTGCGGCGACTAAAGCXSGATTTC^ 

GCTGCTGCAATTGGTACCCCTGGGGCAGCAGATTCTACAGGTGCCATTGCl^^ 

AGTGCTGGGCTGACTAAAACTACAGCCGCAGATGTACTGTCTAGCCTCGCTGATGGTACG 

ACTATTACAGCCACAGGCGTGAAAAATGGCTTTGCTGCAGGAGCCACTTCCAATGCCTAT 

AAACTTAACAAAGATAATAATACATTTACITATGACACGACTGCTACGACAGCT^ 

CAGTCTTACCTGACTCCGAAAGCGGGCGACACTGCAAO^TTCAGTGTTGAAATT^ 

ACTACACAAGACGTCGTGCTGTCCAGTGATGGCAAACTCACTGCTAAGGATG^ 

CITTACATTGATACAACTGGTAATTTAACTCAGAATGGTGG 

CTCGCGGAAGCGACTCTGAGTGGTTTAGCTCTGAACAAAAATGGTTTAAaSGCTG 

TCCACAATTACTACAGCTGATAAO^CTTCGATTGTACTGAATGGTTCAAGCGATGGTACT 

GGTAATGCTGGTACTGAAGGTACGATTGCTGTTACAGGCGCTGTAATTAGTTC^^ 

CTGCAATCTGCAAGCAAAACGACTGGTTTCACTGTTGGTACAGTAGACACA^ 

ATCTCTGTAGGTACTGATGGGAGTGTTCAGGCATATGATGCTGCGACITCTGGC^ 

GCTTOTTACACCAACACTGACGGTACACTGACTACTGATAACACCACTAAACTGTATCTG 

CAGATAGATGGCTCTGTAACCAACGGTTCAGGTAAAGCGGTCTATGTAGAAGCGGATGGT 

GATTTCAOTACCGACGCTGCAACCAAAGCCGCAACCACaVCCGATCCGCTGGCCG 

GATGACGCAATCAGCCAGATCGACAAGITCCGTTCATCCTTGGGTGCTATCCAGAACCGT 

CTGGATTCrcCAGTCACa^CCTGAACAACACCACCACCAACCTGTCTGAAGCG»^ 

CGTATTCAGGACGCCGACTATGCGACCGAAGTGTCCAATATGTCGAAAGCGCAGATCATC 

CAGOIGGCCGGTAACTCCGTGCTGGCAAAAGCCAACCAGGTACCGCAQCAGQTTCTGTCT 

CTGCTGCAGGGTTAA 
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AACAAATCTCAGTCTTCTCTGAGCTCCGCCATTGAA 

CGTCTCTCTTCTGGCCTGCGTATTAACAGTGCTAAAGATGACGCAGCAGGTCAGGCGATT 

GCTAACCGTTTTACAGCAAATATTAAAGGTCTGACTCAGGCTTCCCGTAACGCGAATGAT 

GGTATTTCTGTTGCGCAGACCACTGAAGGTGCGCTGAATGAAATTAAC^ 

CGTATTCGTGAACTTTCTGTTCAGGCAACTAACGGTACTAACTCTGACAGCGATCTTTCT 

TCTATCCAGGCTGAAATTACTCAACGTCIXKSAAGAAATTGACCGTGT^^ 

CAGTTTAACGGCGTGAAAGTCCTTGCTGAAAATAATGAAATGAAAATTCAGGTTGGTGCT 

AATGATGGTGAAACCATCACTATCy^TCTGGCAAAAATTGATGCGAA^ 

GACGGTTTTAATATCGATGGCGCGCAGAAAGCAACCGGCAGTGACCTGATTTCT^ 

AAAGCGACAGGTACTGATAATTATCAAATTAACGGTACTGATAACTATACTGTTAATGTA 

GATAGTGGCGTAGTACAGGATAAAGATGGCAAACAAGTTTATGTGAGTACTGCGGATGGT 

TCACTTACGACCAGCAGTGATACTCAATTCAAGATTGATGCAACTAAGCTTGCAGT^ 

GCTAAAGATTTAGCTCAAGGGAATAAGATTGTCTACGAAGGTATCGAATTTACAAATACC 

GGCACTGTCGCTATAGATGCC7JAGGTAATGGTAAATTAACCGCCAATGTTGATGGTAAG 

GCTGTTGAATTCACTATTTOSXKSGAGTACTGATACATCA 

CCTACGACAGCCCTATACAAAAATAGTGCAGGGCAATTGACTGCAACAAAAG 

AAAGCAGCGA^CTATCTGATCTTGATCTGAACGCTGCCAAGAAAACAGGA^ 

GTTGTTAACGGTGGAACTTAOSATGTTAGTGCAGATGGTAAAACGATAACGGAGAC^ 

TCTGGTAACAATAAAGTOITGTATCTGAGCAAATCAGAAGGTGGTAGCCCGATTCTGGTA 

AACGAAGATGCAGCAAAATCGTTGCAATCTACaVCCAACCaSCXroAAACTATCGA 

GCATTGGCTAAAGTTGACAATCTGCGTTCTGACCTCGGTGCAGTACAAAA 

TCTGCCATCACCAACCTTGGCAACACCGTAAACAACCTGTCTTCTGCCCGTAGCCGTATC 

GAAGATGCTGACTACGCGACCGAAGTGTCTAACATGTCTCGTGCGCAGATCCTGCAACAA 

GCGGGTACCTCTGTTCTGGCACAG 



Figure 26 



S:\P30384\FIG7-69 



ATGGCACAAGTCATTAATACCAACAGCCTCTCGCTGATCACTCAAAATA 

ATATCAACAAGAACCAGTCTGCGCTGTCGAGTTCTATCGAGCGTCTGTCTTCTGGCTTGC 

GTATTAACAGCGCGAAGGATGACGCAGCGGGTCAGGCGATTGCTAACCGTTTTACTTCTA 

ACATTAAAGGCCTGACTCAGGCGGCACGTAACGCCAACGACGGTATCTCTCTGGCGCAGA 

CCACCGAAGGTGCGCTGTCTGAAATCAACAACAACTTACAGCGTGTACGTGAACTGACCG 

TTCAGGCAACCACCGGTACTAACTCCGACrCCGACCTGGCTTCTATTCAGGACGAAATC^ 

AATCCCGTCTGGATGAAATTGACCGCGTATCTGGTCAGACTCAGTTCAACGGCGTGAACG 

TGCTGGCAAAAGACGGTTCCATGAAAATTCAGGTAGGTGCTAACGACGGCCAGACTATCA 

CTATTGACCTGAAAAAAATCGACTCTGATACTCTGGGCCTGAATGGTTTTAACGT^^ 

GTTCTGGGACGATTACCAACT^GCAGa^CTGTCAGTGATGTTACTCGCGCA^ 

CATTGGTGAATGGTGCCTATGATATAAAAACCACTAACACAGaKri^CT 

CCTTCGCGAAATTGAATGATGGTGATGTTGTTACTATCAATAATGGTAAGGATACTGCCT 

ATAAATATAATGCTGCTACAGGTGGGTTTACGACGGATGTCTCCATCTCCGGGGATCCTA 

CCGCTGCTGACGCTACTGCTAATAAAACTGCCCGTGATGCACTTGCGGCGTC^ 

CTGAGCCGGGTAAAACTGTTAATGGTTCTTGGACTACGAATGATGGTAaSG^ 

ATACCGATGCCGATGGTAAGATTTCTATTGGTGGTGTTGCTGCTTATGTA 

GCAACCTOACCACTAACGCAGCAGGTATGAaSACTCAAGCAACT^ 

CTGCTGCTGCATCTGCTACTGGTAAGGGTGGATCCCTGACCTTTGGTGACACGACGTATA 

AAATTGGTCyVGGGTACGGCTGGGGTTGATCCraATGACGCTTCAGATGATC 

CCATTTCTTACTCTAAATCAGTAAGC7AGGATGTTGTTCTTGCTGATACT 

GTAACACGACAACAGTTGATTTCAACTCCGGTATCyiTGACTTaVAAGGTTAGTTTCGAT^ 

CAGGTACATCyU^CTGATACATTCAAAGATGCAGATGGTGCTATCACCAAAACTAi^ 

ACACCACTTCTTATGCTGTAAATAAAGATACTGGTGAAGTTACCGTTGOTGATTATGCT 

CGGTAGATAGCGCCGATAAGGCTGTTGATGATACTAAATATAAACCGACTATCGGCGCGA 

CAGTTAACCTGAATTCTGCAGGTAAATTGACCACTGATACCACCAGTGCAGGCACAGCAA 

CCT^GATCCTCTGGCTGCCCTGGACGCTGCTATCAGCTCCATCGACAAATTCCGTTC^ 

CCCTGGGTGCTATCCAGAACCGTCTGGATTCCGCAGTCACCAACCTGAACAACACCACTA 

CCAACCTGTCCGAAGCGCAGTCCCGTATTCAGGACGCCGACTATGCGACCGAAGTGTCCA 

ACATGTCGAAAGCGOVGATTATCCAGCAGGCCGGTAACTCCGTGCTGGCAAAAGCCAACC 

AGGTACCGCAGCAGGTTCTGTCTCTGCTACAGGGTTAA 
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AACAAAAACCAGTCTGCGCTGTCGACTTCTATC 

GAGCGCCITTCTTCTGGTCTGCGTATTAACAGCGCTAAAGATGACGCTGCGGGCCAGGCG 

ATTGCTAACCGCTTCACTTCTAACATCAAAGGTCTGACTCAGGCCGCACGTA^^ 

GACGGTATTTCTCTGGCGCAGACCACTGAAGGCGCGCTGTCTGAGATTAACAACAACTTG 

CAGCGTGTGCGTGAGTTGACTGTACAGGCGACGACCGGGACTAACTCTGATTCTCACC^ 

TCTTCTATCCAGGATGAAATCAAATCCCGTTTAAGCGAAATT6ACCGTGTATCTGGTCAG 

ACTCAGTTTAACGGCGTGAACGTACTGGCTAAGAATGACACCCTGTCTATTCAGGTAGGT 

GCAAATGACGGTOlGACTATCAATATTGACCTGCAGCAAATCGATTCravT^ 

CTGGATGGTTTCyVGCGTTAAAAATAATGATGCAGTGAAAACCAGTGCTGCOSTGAATAOT 

CTTGGGGGGGGGGCAGGTTCTGTTGCTCTCGACrTCGCAAC^ 

ACTGGTCTCGGTAGCGGTGCTATGAGCGAAATTGCTAAAGACGATAATGGTGAT^^ 

GCGCATGTCACAGGGACTACGGGTAATACTGCTGATGGTTACTATGCTGTCGATATCGAC 

AAGGCTACCGGTGAGGTCGCTCTCAAAGATGGTAACGTAGATACACCGACAGGTACGC^ 

ACGACGACAAGCACATATGACTTCACAGACGCIXSGTCAAACCGTTT^ 

GCTGCAACAGCCGGTATO^GCACTGGTGCTTCTCTCGTTAAACTTCA 

AATGATACri^TACTTATGCAATCAAAGCACAAGATGGCAGCCTGTAT^ 

GATGAGGCTACCGGTAAAGTCACTGTCAAAACCGCCAGCTATACrGATGCTGAC^ 

GCAGTGACOSATGCCGCTGTAAAACTGGGTGGTGACAATGGCACAACCGAAATTGT^ 

GATGCTGCGTCAGGTAAAACTTACGATGCTGGTGCACTGCAAAACGTTGATCTCTCCAGT 

GCAACCAACACXSGTAACCGCAATCCCGAACGGTAAAACCACGTCrcCGCTGGCTGCCCCT 

GACGACGCAATCAGCCAGATCGACAAATTCCGCTCCTCCCTCGGT6CGGTGCAGAACCGT 

CTGGATTCCGCGGTCACCAACCTGAACAACACCACTACCAACCTGTCTGAAGCGCAGTCC 

CGTATTCAGGACGCTGACTATGCGACCGAAGTATCCAACATGTCGAAAGCGCAGATCATC 

CAGCAGGCAGGTAACTCCGTGCTGTCCAAA 
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GCGCTGTCGACTTCTATCGAGCGCCTCTCTTCTGGTCTGCGCATTAACAGCGCTAAAG 

ATGACGCTGCGGGCCAAGCGATTGCTAACCGCTTCACTTCTAACATCAAAGGTCTGACTC 

AGGCCGCACGTAACGCCy^CGACGGTATTTCTCTGGCGCAGACCACTGAAGGCGCACTGT 

CTGAAATCAACAACAACTTGCAGCGTGTTCGTGAACTGACCGTTCAGGCCACTACCGGTA 

CTAACTCnXSATTCTGACCTCTCrraVATACAGGACGAAATCAAATCCCGTCTCG 

TTGACCGCGTATCCGGTCAGACrCAGTTC7^CGGa3TTAATGTTCTTTCCAAAGATGGTT 

Cy^TGAAAATTCAGGTTGGTGCGAATGATGGTCAAACTATCTCCATCGATCTGAAGAAAA 

TTGATTCTTCAACTTTGGGGCTGAATGGCTTCTCAGTTTCTAAAAACTCTCTO 

GCAATGCTATCACATCTATCCCGCAAGCCGCTAGCAATGAACCTGTTGATGTTAACTTCG 

GTGATACTGATGAGTCTGCAGCAATCGCAGCCAAATTGGGGGTCT 

TGTCGCTGCACAACATCCTTGATAAAGATGGTAAGGCAACAGCTGATTATGTTGTTCAGT 

CAGGTAAAGACITCTATGCTGCTTCTGTTAATGCCGCTTCAGGTAAAGTAACCTTAAAC^ 

CCATTGATGTTACTTATGATGATTATGCGAACGGTGTTGACGATGCCAAGCAAACA 

AGCTGATCTUVAGTTTCAGCAGATAAAGACGGCGCAGCTCAAGGTTTTGTC^ 

GCAAAAACTATTCTGCTGGTGATGCGGCAGACATTCTTAAGAATGGAGCAAaVGCT 

AGTTAACTGATCTGAATTTAAGTGATGTTACraATACTAATGGTAAGGTAACC^ 

CGACTGAGCAATTTGAAGGTGCTTCAACTGAGGATCCGCTGGCGCTTCT^^ 

TTGCATCAGTCGACAAATTCCGGTCTTCTCTAGGTGCCGTGCAGAACCGTCTOSAT^^ 

CTATCACCAACCTGAACAACACCACCACCAACCTGTCTGAAGCGO^GTCC^^ 

ACGCCGACTATGCGACCGAAGTGTCCAACATGTCGAAAGCGCAGATCATCCAGCAGGCA 
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ATGGCACAAGTCATTAATACCAACAGCCTCTCG 

CTGATCACTCAAAATAATATCAACAAGAACCAGTCTGCGCTGTCGAGTTCTATCGAGCGT 

ctgtcitctggcntgcgtattaacagcgcgaaggatgacgccgcaggtcaggcgat^ 

aaccgttttacttctaaovttaaaggcctgactcaggctgcacgtaacgcc^ 

atttctgttgcacagaccactgaaggcgcgotgtccgaaatctiacaacaa^ 

attcgtgaactgacggttctiggccactagagggactaactccgattctgacctggactcc 

atccaggacgaaatcaaatctcgtctggacgaaattgaccgcgtatctggtcagacccag 

ttcaacggcgtgaacgtgctgtctaaagatggctcgatgaaaattcaggtcggcgcgaac 

gatggcgaaacgattactattgatctgaagaaaattgactctgatacgctaaatctggct 

ggttttaacgtgaatggtgctggctctgttgataatgccaaggcgactggou^ 

actgatgctggttttacggcaagcgcagcraatgctaatggcaaaatca^ 

gacaccgttactaaattcgacaaagcgacagcggctgatgtattgggcaaagcggctgct 

ggcgatagcattacctatgcgggcactgatactggcttaggagtcgctgctgatgcctcg 

acritaca^cctacaatgcagccaataagtcttacacttttgatgc^ 

gcggatgctggaacggcactgaaagggtacttaggcgcatctaacticcggtaaaatt;^ 

atokstggtaccgagcaagaagttaacattgccaaagatggctccatca^ 

ggcgatgcgctgtatctcgatagtaccggcaacttaacct^aaaataccgcgaatttgg^ 

gctgctgataaagauvctgtagataaactgtttgctggtgctavggatgcaa^ 

ttcgatagoggcatgacy^gctaaattcgato^ctgctggtaccgttgatttc^^ 

gcgtctatttctgctgatgcaatggcatcaaccttaaataatggttcctatacag 

gtaggtggtaaggcttatgccgtaaccgctggcgcyigttcagacaggtggcgcagatgtg 

tataaagataccactggcgcactgacgactgaagatgacgaaaccgttaccgcgacctac 

tacggttttgctgatggtaaagtttctgacggtgaaggttctactgtctataaagcto 

gatggttccatcactaaagatgcgactaccaagtctgaagcaaccactgaccctctgaaa 

gcccttgacgacgcaatcagccagatcgacaaattccgctcctccctcggtgccgtt 

aaccgtctggattccgcostcaccaacctgaacaacaccactaccaacctgtctgaagcg 

cagtcccgtattcaggacgccgactatgcgaccgaagtgtccaacatgtcgaaagcgcag 

atcattcagcaggccggtaactccgtgctggcaaaagccaaccaggtacc^^ 

ctgtctctgctgcagggttaa 
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AACAAATCTCAGTCTTCTCTTAGCTCTGCTATTGA 

GCGTCTCTCTTCTGGCCTGCGTATTAACAGTGCTAAAGATGACGCAGCAGGTCAGGCGAT 

TGCTAACCGTTTTACGGCAAATATTAAAGGTCTGACTCAGGCTTCCCGTAACGCGAATGA 

TGGTATTTCTGTTGCGCAGACTACTGAAGGTGCGCTGAATGAAATTAACAACAACC^ 

GCGTGTACGTGAACrGACTGTTCAG6CAACTAACX5GTACTAACrrCTGACAGCGATCT^ 

TTCTATTCAGGCAGAAATTACTCAAC6TCTGGAAGAAATTGACCGTGTATCTGAGCAAAC 

TCAGTTTAACGGOSTGAAAGTCCTTGCCGAAAATAATGAAATGAAAATTCAGGTT^ 

TAATGATGGGGAAACCATCACTATCAATCrrGGCAAAAATTGATGCGAAAACTCTCGGCCT 

GGACGGCTTTAATATCGATGGCGCGCAGAAAGCAACTGGCAGTGACCTGATTTCTAAACT 

TAAAGCGACAGGTACTGATAATTATCAAATTAACGGTACTGATAACTATACTGTTAATC^ 

AGATAGTGGAGCAGTTCAAAATGAGGATGGTGACGCAATTTTTGTTAGCGCTACCGATGG 

TTCTCTGACTACTAAGAGTGATACAAAAGTCGGTGGTACyVGGTATTGATGCGACl^^ 

TGCAAAAGCCGCAGTTTCTTTAGCTAAAGATGCCTCAATTAAATACCAAGGTAT^^ 

CACCAACAAAGGCACTGATGCT^TTTGATGGCAGTGGTAACGGCACTCTAACCGCT 

TGATGGCAAAGATGTAACCITTACTATTGATGCGACAGGGAAGGACGC^ 

GTCTGATCCTGTTTACAAAAATAGTGCAGGTCAGTTCACTACAACTAAGGTTC 

AGCCGOTACAGCATCGGATCTGGACTTAAATAACGCTAAAAAAGTGGGTAGTTCTTTAGT • 

TGTAAATGGCGCTGATTATGAAGTTAGCGCTGATGGTAAGACAGTAACTGGGCTTGGC^ 

AACTATGTATCTGAGCyVAATCAGAAGGTGGTAGCCCGATTCTGGTAAAAOAAGATG^ 

AAAATCGTTGCAATCTACTACCAACCCGCTCGAAACCATCGACAAGGCATTGGCT 

TGACAATCrcCGTTCTGACCTCGGTGCAGTACAAAACCGTTTCGACTCT^ 

CCTTGGCAACACCGTAAACAACCTGTCTTCTGCCCGTAGCCGTATCGAAGATGCTGACTA 

CGCGACCGAAGTGTCTAACy^TGTCTCGTGCGO^GATCCTGCAACAAGCGGGTACCTCTGT TCTGGCGCAG 
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ATGGCACAAGTCATTAATACCAACAGCCTCTCGCTGATCACTCAAAATA 

ATATCAACAAGAACCAGTCTGCGCTGTCGAGTTCTATCGAGCGTCTGTCTTCTGGCTTGC 

GTATTAACAGCGCGAAGGATGACGCCGCAGGTCAGGCGATTGCTAACCGTTTTACTTCTA 

acattaaaggcctgactcaggctgcacgtaacgccaacgatggtatttctgttc 

ccactgaaggcgcgctgtccgaaatcaacaacaacttacagcgtatccgtgaactgacgg 

ttcaggcttctaccgggactaactccgattcggatctggactccattcaggacgaaatca 

aatcccgtctggaosaaattgaccgcgtatctggccagaccctigttcaacggcgtg;^ 

tactggcgaaagacggttcaatgaaaattcaggttggtgcxsaatgacggccagactata 

cgattgatctgaagaaaattgactctgatacgctggggctgagtgggtttaatgtgaatg 

gtagcggggctgtggctaatackky^gcgactaaatctgatttggcagcagct 

TGGCTCCAGGTACTGCTGATGCTT^TGGTACAGTTACCTATACTGTTGGCGC^ 

AAACATCTACAGCTGCAGATGTAATTGOSAGTTTGGCTAATAACGCAAAAGTTAATGCCA 

CAATTGCy^TGGTTTTGGATCGCCAACAGCTAOVGATTATACATACAACA^ 

GCGATTTTACATATAGTGCAACTATTGCAGCTGGTACAAATTC^^ 

CTCAGTTACaU^TCCTTCCTGACACCAAAAGCGGGCGATACTGCrrAAC^ 

TTGGTTCTACGTCAATTGACGTTGTATTGGCTAGOSACGGTAAAATTACCGCGAA^ 

GTTCAGAACTATTTATTGACGTAGATGGTAACCTCACTCAAAACAATGCT^^ 

AAGCAGCCACTCTTGATGCACTGACTAAAAACrGGCATACAACAGGCACACOT 

TATCTACGGTAATTACAACTGAAGATGAAACAACCTTCACTCTGGCTGGC:^ 

CTACTACTTCTGGTGCAATCACTGTAGCAAATGCAAGAATGAGTGCrrGAGTCT 

CGGCAACTAAGTCCACAGGATTCACAGTTGATGTTGGAGCTACTGGTACCAGCGCA 

ATATTAAAGTTGATAGTAAAGGTATAGTACAACAACACACAGGTACAGGTTTTGAAGACG 

CTTACACCAAAGCTGATGGTTCACTGACTACCGATAATACAACCT^TCIKSl^^ 

AAGACGGAACTGTGACCAATGGTTCAGGTAAAGCyVGTCTATGTTTCAGCGGATGGTAAT^ 

TTACTACTGACGCTGAAACTAAAGCTGCAACCACCGCCGATCCACTGAAAGCTCTGGACG 

AAGCGATCAGCTCCATCGACAAATTCCGTTCTTCCCTCGGTGCGGTGCAAAACCGTCTGG 

ATTCCGCAGTCACCAACCTGAACAACACCACTACTAACCTGTCTGAM 

TTCAGGACGCTGACTATGCGACCGAAGTGTCCy^TATGTCGAAAGCGCAGATCATCCAGC 

AGGCCGGTAACTCCGTGCTGGCAAAAGCTAACCAGGTACCGCAGCAGGTTCTGTCTCrGC TGCAGGGTTAA 
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AACAAAAACCAGTCTGCGCTGTCGACTTCTATCGAGCGCCTCTCTT 

CTG6TCTGCGCATTAACAGCGCTAAAGATGACGCTGCGGGCCAGGCGATTGCTAACCGCT 

TCACTTCTAACATCAAAGGTCTGACTCAGGCCGCACGTAACGCCAACGACGGTATCTCTC 

TCGCGCAGACCACTGAAGGCGCACTGTCTGAAATCAACAACAACTTGCAGCGTGTT 

AGCTGACCGTTCAGGCCACTACCGGTACTAACTCTGATTCTGACCTGTCTTOA 

ACGAAATCAAATCCCGTCTCGATGAAATTGACCGCGTATCCGGTCAGACTCAGTTCAACG 

GCGTGAACGTACTGGCAAAAGATAACACCATGAAGATTCAGGTTGGTGCGAA^^ 

AGACTATATCCATCGACCTGCAAAAAATCGACTCTTCTACTCTTGGTTTGAACGGTTTCT 

CCGTTTCTAAAAATGCrCTCGAAACTAGCGAAGCGATCACTCAGTTGCCGAACGGTGCGA 

ATGCyVCCAATCGCTGTGAAGATGGATGCGTCTGTTCTGACCGATCTTAACATTACTGATG 

CTTCCGCTGTTTCGCTGCTICAACGTAACTAAAGGTGGTGTCGCAACGTCTAC^ 

TTCAGTATGGCGATAAGAGCTATGCAGCATCTGTTGATGCGGGAGGTACAGTAAAACTGA 

ATAAAGCCGACGTAACATATAACGACGCAGO^TGGTGTTACGAATGCCACCCAGATTG 

GTAGTCTTGGTTCAGGTTGGTGCTGATGCAAACAATGATGCAGTTGGT^^ 

AGGGGAAAAACTATGTTGCTAATGACTCATTAGTCAATGCrAATGGCXSCTGCT^ 

CAGCAACTAGAGTTACT^TTGATGGTGATGGTAGCCTTGGAGCT^ 

AACTTAGCCAAAATGGTGCTACTGCTGCAACATCAGAGTTCGCTGGTGC^ 

ATCCACTGACrCTGCTGGACAAAGCTATCGCATCTGTTGATAAATTCCGTTCTO 

GGGCGGTACAGAACCGTCTGAGCTCOXrrGTAACCAACCTGAACTACACC^ 

T6TCTGAAGCGCAGTCCCGTATTCAGGACGCCGACTATGCGACCGAAGTGTCCAACATGT 

CGAAAGCGCAGATCATCCAGCAGGCAGGTAACTCCGTGCTGTCCAAA 
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ATGGCJ^CAAGTCATTAATACCAACAGCCTCTCGCTGATCACTCAAAATAATATCAACAAGA 

ACCAGTCrcCGCTGTCGAGTTCTATCGAGCGTCTGTCTTCTGGCTTGCGTATTAACAGCG 

CGAAGGATGACGCCGCAGGTCAGGCGATTGCTAACCGTTTTACTTCTAACATTAAAGGCC 

TGACTCAGGCTGCACGTAACGCCAACGACGGTATTTCTGTTGCACAGACCACTGAAGGCG 

CGCnXSTCCGAAATCAACAACAACTTACAGCGTATTCGTGAACTGACGGTTCAGGCG^^ 

COSGAACTAACTCCACCTCTGACCTGGACTCCATTCAGGACGAAATa^TCCCGTCTTG 

ATGAAATTGACCGCGTATCCGGCCAAACCCAGTTCAACGGCGTGAACGTACTGTCAAAAG 

ATGGCTCGATGAAAATTCAGGTCGGCGCTVAATGATGGTGAAACCATCACGATTGATCT^ 

AAAAGATCGACTCTTCTACATTGAAGCTGACCAGCTTCyU^TGTTAACGGTAAAGGC^^ 

TTGATAATGCTAAAGCaiCTGAAGCAGATCTGACCGCTGaSGGCTTCTCCaUVGGTGC^ 

TCGTCAGTGGCAACAGCACCTGGACTAAATCTACTGTTACTACCTTTAATGCAGCAACAG 

CTACCGACGTGCTGGCAAGCGTTAGCGGCGGCAGCACTATTAGCGGTTATACCGGTACAA 

ACAATGGATTAGGCGTAGCGGCTTCTACTGCATATACCTACAACGCAACCAGCAAGTCTT 

ATTCAlTTGACGCAACCGaiCTTACCy^TGGCGATGGTACTGGGG 

CTGATGTGCTGAAAGCCTATGCAGCAAACGGTGATAATAaSGCTCAGATCTCCATCGG 

GAAGCGCTCAGGACGTTAAAATTGCCAGCGATGGCACCCraACTGACGT^ 

CTTTATATATTGGTTCTGACGGCAACCTGACTAAAAACCAGGCCGGaSGTCCA^ 

CAACGTTGGACGGTATTTTCAACGGTGCGAATGGTAATGCAGCAGTTGATGCG^ 

CATTCGGCAGCGGO^TGACCGTTGATTTCACCCAGGCrAGCAAAAAAGTGGA^^ 

GCGCAACGGTATCCGCCGAAGATATGGACACTGCGTTAACTGGGCAGGCTTATACCGTAG 

CTAACGGCGCACAGTCTTTTGACGTTGCCGCTGGTGGGGCAGTAACCGCTACTACAGGTG 

GCGCTACCGTAAATATTGGTGCTGATGGTGAACTGACGACTGCGACa^CAAGACT^ 

CAGAAACTTATCACGAATTTGCTAACGGCAATATTCTGGATGATGACGGCGa^TC^ 

ACAAAGCGGCTGACGGTTCTCTGACCACTGAAGCTACTGGTAAATCCGAAGTGACC^ 

ATCCGCTGAAAGCGCTGGACGATGCTATOSCATCCGTAGACAAATTCCGCTCCrCCCTCG 

GTGCGGTGCAGAACCGTCTGGATTCCGCAGTCACCAACCTGAACAACACCACTACC^ 

TGTCTGAAGCGCAGTCCCGCATTCAGGACGCCGACTATGCGACCGAAGTGTCCAATATGT 

CGAAAGCGCAGATCATCCAGCAGGCCGGTAACTCCGTGCTGGCTUU^GCCAACCA 

CGCAGCAGGTTCTGTCTCTGCTGCAGGGTTAA 
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ATGGCACAAGTCATTAATACCAACAGCCTCTCGCTGATCAC 

TCAAAATAATATCAACAAGAACCAGTCTGCGCTGTCGAGTTCTATCGAGCGTCTGTCTTC 

TGGCTTGCGTATTAACyVGCGCTAAGGATGACGCCGCGGGTCAGGCGATTGCTAACCGTTT 

TACTTCTAACATTAAAGGCCTGACTCAGGCTGCACGTAACGCCAACGACGGTATTTCTGT 

TGCGCAGACCACTGAAGGC6CGCTGTCCGAAATCAACAACAACTTACAGCGTATCCGTGA 

ACTGACGGTTCAGGCTTCTACCGGGACTAACTCCGATTCGGATCTGGACTCCATTCAGGA 

CGAAATCAAATCCCGTCTGGACGAAATTGACCGCGTATCTGGCCAGACCCAGTTCAACGG 

CGTGAACGTACTGGCGAAAGACGGTTCAATGAAAATTCAGGTTGGTGCGAAT6ACGGCCA 

GACTATCACTATTGATCTGAAGAAAATTGACTCAGATACGCTGGGGCTGAGTGGGTTTAA 

TGTGAATGGTGGCGGGGCTGTTGCTAATACTGCAGCGACTAAAGATGATTTGGTCGCTGC 

ATCAGTTTCAGCTGCGGTAGGTAATGAATACACTGTCTCTGCTGGCCTGTCGAAATC^ 

TGCTGCTGATGTTATTGCTAGTCTCACAGATGGTGCGACAGTAACTGCGGCTGGTGTAA 

CAATGGTTTTGCTGCAGGGGCAACr<3GAGATGCTTATAAAOT 

TTTTACTTACAATACCACCTaU^CAGCGGCAGAACTCCAATCTTACCT 

GGGGGATACCGCAACTTTCTCCGTTGAAATTGGTGGCACCAAGCAGGATGTTGTTCTGG 

TAGTGATGGCyVAAATCavCAGCAAAAGACGGGTCTAAACrrTATATT^ 

TTTAACCCAAAACGGTGGAGGTACTTTAGAAGAAGCTACCCTCAATGGCITAGCT^ 

CCACTCTGGTCCAGCCGCTGCTGTACAATCTACTATTACTACTGCGGATGGAACTTCy^T 

AGTTCTAGCAGGTTCTGGCGACTTTGGAACAACAi\AAACIXK:TG^ 

AGGAGCAGTGATCAGTGCTGATGCACTTCTTTCCGCCAGTAAAGCGACT^ 

TGGCACTTATACCGTAGGTACAGATGGAGTTGTTAAATCTGGTGGCAATGACGTTTATAA 

O^GCTGACGGGACGGGATTAACTACrrGACAATACCACAAAATATTAT^ 

CX3GGTCTGTAACTAATGGTTCTGGTAAAGCTGTGTATGCTGATGCAACAGGAAAACT 

TACTCACGCrroAAACTAAAGCCGAAACCACCGCCGATCCCCTGAAAGCTCTG^ 

GATCy^GCTCCATCGACAAATTCCGTTCTTCCCTCGGTGCGGTGCAAAACCGTCTGGATTC 

CGCGGTCACC7ACCnX3AACAACACCACTACCAACCTGTCCGAAGCGCAGTCCCGTATT 

GGACGCCGACTATGCGACCGAAGTGTCCAACATGTCGAAAGCGCTIGATCATCCAGCAGGC 

CGGTAACTCCGTGCTGGCAAAAGCTAACCAGGTACCGCAGa^GGTTCTGTCTCTGCTGCA GGGTTAA 
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ATGGCACAAGTCATTAATACCAACAGCCTCTCGCTGATCAC 

TCAAAATAATATCAACAAGAACCAGTCTGCGCTGTCGAGTTCTATCGAGCGTCTGTCTTC 

TGGCTTGCGTATTAACAGCGCGAAGGATGACGCCGCGGGTCAGGCGATTGCTAACCGTTT 

TACTTCTAACATTAAAGGCCTGACTCAGGCTGCACGTAACGCCAACGACGGTATTTCCGT 

TGCGCAGACCACCGAAGGCGCGCTGTCCGAAATCAACAACAACTTACAGCGTA^^^ 

ACTGACGGTTCAGGCCACTACCGGTACTAACTCCGATTCTGACCTGGACTCCATCCAGGA 

CGAAATCAAATCTCGTCTTGATGAAATTGACCGCGTATCTGGTCAGACCCAGTTCAATGG 

CGTGAATGTGTTGTCCAAAGACGGTTCAATGAAAATTCAGGTGGGCGCAAATGATGGTGA 

AACCATCACGATTGACCTGAAAAAAATCGACTCrTCTACACTGAAGCrcAC 

CGTCAACGGTAAAGGCGCTGTTGATAATGCAAAAGCCACTGAAGCAGATCTGACCGCTO 

GGGCTTCTCCCAAAGTGCAGTTGTCAGTGGCAATAGCACCTGGACTAAATCTACTGTTAC 

TACCTTTAATGCAGCAACAGCTACCGATGTGCTCGCTAGCGTTAGTGGCGGCAGC^ 

TAGCGGTTATGCTGGCACAAACAATGGGTTAGGC6TAGCGGCTTCTACTGCATATACCTA 

CAACGa^CCAGCAAGTCTTATTCATTTGACGOUiCCGCACnTACTAAT^ 

TGCGGGCTCAACTAAAGTTGCTGATGTTCTGAAAGCCTATGCAGCAAACG^ 

GGCTCAGATCTCCATOSGTGGTAGCGCTCAGGAAGTTA/UATTGCaiGCGATGGTACC 

GACGGATACTAATGGCX5ATGCTTTATACATTGGTGCTGACGGTAACCTGACGAAAM 

GGCCGGCGGCCCyvGCCGCGGCAACGTTGGACGGTATTTTCAACGGTGCG^ 

TGCAGTTGATGCGAAGATTACCnrCGGCAGCGGCATGACCGTTGACIT 

CAACAATGTGGATATTAAGGGCGCGACGGTATCCGCCGAAGATATGAACACTGCGTTAAC 

CGGTCAGGCTTATACCGTAGCTAACGGCGOVCAGTCITATGACGTrGCCGCTGATGGTGC 

AGTAACTGCTACTACAGGTGGAGCGACCGTAAATATTGGTGCTOAGGGTGAACTGACGAC 

TGCGGCO^CAAGACTGTCACAGAAACTTATCACGAATTTGCTAACGGCAATATTCTG^ 

TGATGACGGCGCGGCTCTGTATAAAGCGGCTGACGGCTCTCTGACCACTGAAGCTA^^ 

TAAATCTGAAGCGACCACGGATCCGCTGAAAGCGCTGGACGATGCTATCGCATCCGTAGA 

CAAATTCCGTTCTTCCCTGGGTGCCGTGCAGAACCGTCrrGGATTCCGCAGTC^ 

GAACAACACCACTACCAACCTGTCCGAAGCGCAGTCCCGTATTCAGGACGCCGACTATGC 

GACCGAAGTGTCCAACATGTCGAAAGCGCAGATTATTCAGCAGGCAGGTAACT^ 

GGCAAAAGCTAACCAGGTACCGCAGCAGGTTCTGTCTCTGCTGCAGGGTTAA 
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AACAAAAACCAGTCTGCGCTGTCGACTTCTAT 

CGAGCGCCTCTCTTCTGGTCTGCGCATTAACAGCGCTAAAGATGACGCTGCGGGCCAGGC 

GATTGCTAACCGCTTCACTTCrrAACATCAAAGGTCTGACTCAGGCCGCACGTAACGCCA^ 

CGACGGTATCTCTCTGGCGCAGACCACTCAAGGCGCACTGTCTGAAATCAACAACAACTT 

GCAGCGTGTGCGTGAGTTGACTCTTGAGGCGACGACCGGGACTAACTCTGATTCTG^^ 

GTCTTCTATTCAGGACGAAATCAAATCCCGTCTGGATGAAATTGACCGTGTTTCCGGTCA 

GACCCAGTTCAACGGCGTGAACGTGCTGGCTAAAAACGGTTCTATGGCGATTCAGGTTGG 

CGCGAATGATGGGCAGACCATCAACATCGACCTGCAGAAAATCGACTCTTCTACTCTGGG 

CCTGGGCGGCTTCTCCXjTATCTAACAATGCACTGAAACTGAGCGATTCTATaVCTC^ 

TGGTGCGAGTGGTTCACTGGCAGATGTGAAACTGAGCTCTGTTGCCrCGGCTCTGGGTGT 

AGACGCAAGCACTCTGACTCTGCACAACGTACAGACCCCAGCTGGCGCAGCAACAGCTAA 

CTATGTTGTCTCTTCTGGTTCTGACAACTACTCAGTATCTGTTGAAGATAGCTCCGGTAC 

AGTTACGCTGAACACCACTGATATAGGTTATACCGATACCGCTAATGGCGTTACTACCGG 

TTCCATGACTGGTAAGTACGTTAAAGTTGGAGCTGATGCATTGGGTGCTGCTGTATC^ 

TGTCACCGTACAGGGACTUU^CTTCAAAGCrrGATGCTGGCGCGCTGGTTAACT 

TGCTGCTGGTAGTCAGAATGTTACTTCTGCAATTGGCGATATTGCTAATAAAGCGAATGC 

TAACATTTACACTGGAACCrCTTOTGCAGATCCACTGGCTCT^^ 

ATCTGTTGATAAATTCCGTTCTTCTCTAGGGGCGGTGCAGAACCGTCTGAGCTCTGCTGT 
AACCAACCTGAACAAC71CCACTACCAACCK3TCCGAAGCGCAGTCCCGTATT^ 
CGACTATGCGACCGAAGTGTCCAACATGTCGAAAGCGCAGATCATCCAGCAGGCGGGTAA 
CTCCGTGCTGTCTAAA 
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ATGGCACAAGTCATTAATACCAACAGCCTCTCGCTGATCA 

CrCAAAATAATATCAACAAGAACCAGTCTGCGCTGTCGAGTTCTATCGAGCGTCTGTCTT 

CTGGCTTGCGTATTAACAGCGCGAAGGATGACGCCGCCGGTCAGGCGATTGCTAACCGTT 

TTACTTCTAACATTAAAGGCCTGACTCAGGCTGCACGTAACGCCAATGACGGTATTTCTG 

TTGCTICAGACCACTGAAGGCGCGCTGTCCGAAATCAACAACAACTTACAGCGT^ 

AACTGACGGTTC71GGCTTCTACCGGGACTAACTCTGA1TCXK5ATCTGGACTCCATTCAC^ 

ACGAAATCAAATCCCGTCTCXSACGAAATTGACCGCGTATCCGGTCAGACCCAGTTC^ 

GCGTGAACGTACTGGCAAAAGACGGTTCGATGAAAATTCAGGTTGGTGCGAACGACGGCC 

AGACTATCACTATTGATCTGAAGAAAATTGACTCTGATACGCTGGGGCTGAGl^ 

ACGTTU^TGGTAGCGCAGATAAGGCAAGTGTCGCGGCGACAGCTGACGGAATGGTTAAAG 

ACGGATATATCAAAGGGTTAACTTCATCTGACGGCAGCACTGCATATACTAAAACTACAG 

aVAATACTGCAGCAAAAGGATCTCATATTCTTGCGGCGCITAAGACT^ 

CCGCAACAGGTGCAAATAGCCTTGCTGATAATGCGACATCGACAACTTATACTTAT^ 

CAACCAGCJ^TACCTTCTCCTATACGGCTGACGGTGTAAACOUIACGAATGCTC 

ATCTCATACCTGCAGCAGGGAAAACGACAGCTGCATCAGTTACTATTGGTGGGACAGCAC 

AGAATGTAAATATTGATGATTCGGGCAATATTACTTaUVGTGATGG<^ 

TGGATTCAACAGGTAACCTGACTAAAAACCAGGCCGGCAACCCGAAAAAAGCAACCGT^ 

CTGGGCTTCrCGGAAATACGGATGCGAAAGGTACrrGCTGTTAAAACAACCATCAAGACA^ 

AGGCTGGTGTAACAGTTACAGCTGAAGGTAATACAGGTACTGTAAAAATTXSAAGGTCCT^ 

CTGTTTCAGCATCTGCATTTACGGGCATTGCATATTCCGCCAAC^^ 

ATGCTGTTGCCGCAAATAATACTACAAATGGTTTCCTGGCGGGG6ATGACTTAACCCAGG 

ATGCTCAAACTGTTTCAACCTACTACTCGCAAGCCGATGGCACGGTCAOTAATAGCGC^^ 

GCAAAGAAATCTATAAAGACGCTGATGGTGTCTACAGCACAGAGAATAAAACATCGAAGA 

CGTCCGATCCATTGGCTGCGCriTGACGACGCJATCAGCrrCCATCGAC^ 

CCTTGGGTGCTATCCyiGAACCGTCTGGATTCCGCGGTCACCAACCTGAACAACACC^ 

CCAACCTGTCCGAAGCGCAGTCCCGTATTCAGGACGCCGACTATGCGACCGAAGTGTCCA 

ACATGTCGAAAGCGCAGATCATCCAGCyWSGCCGGTAACTCCGTGCTGGCAAAAGCTAACC 

AGGTACCGCAGCAGGTTCTGTCTCTGCTGCAGGGCTAA 
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AACAAATCTCAGTCTTCTCTGAGCTCCGCCATTGAACGTCTCTCTTCTGGCCTGCGTA 

TTAACAGTGCTAAAGATGACGCAGCAGGTCAGGCGATTGCTAACCGTTTTACAGCAAATA 

TTAAAGGTCTGACTCAGGCTTCCCGTAACGCGAATGATGGTATTTCTGTTGCGCAGACCA 

CTGAAGGTGCGCTGAATGAAATTAACAACAACCTGCAGCGTGTACGTGAACTGACTGTTC 

AGGCyVACTAACGGTACTAACTCTGACAGCGATCTTTCTTCTATCCAGGCTGJ^ 

AACGTCTGGAAGAAATTGACCGTGTATCTCAGCyVAACTCAGTTTAACGGCGTGAAAGTCC 

TTGCTGAAAATAATGAAATGAAAATTCAGGTTGGTGCTAATGATGGTGAAACCATCACTA 

TCAATCTGGCAAAAATTGATGCGAAAACTCTOSGCCTGGACGGTTTTAATATCGATGGCG 

CGCAGAAAGCAACTGGCAGTGACCTGATTTCTAAATTTAAAGCGACAGGTACTGATAACT 

ATGATGTTGGCGGTGATGCTTATACTGTTAACGTAGATAGCGGAGCTGGGTAATGACTCC 

AACTTATTGATAGTGTTTTATGTTCAGATAATGCCCGATGACTTTGTCATGCAGCTCC^^ 

CGATTTTGAGAACGACAGCGACTTCCGTCCa\GCCGTGCCAGGTGCrcC^^ 

GTTATGCCGCTCAATTCGCreCGTATATCGCTTGCTGATTACGTGCAGCTT^ 

GCGGGATTCATACAGCGGCCyiGCaiTCCGTCATCCATATCACCT^CGTCAAAGGGTGACJ^ 

CAGGCrCATAAGACGCCCCAGCGTCGCCATAGTGCGTTCACCGAATACGTGax:AAC^ 

CGTCTTCCGGAGCCTGTCATACGCGTAAAACAGCCAGCGCTGGCGCGATOT 

ATAGTCCCACTGTTCGTCCATTTCCGCGCAGACGATGACGTCACTGCCCGGCTGTATGCG 

CGAGGTTACCGACTGCGGCCTGAGTTTTTTAAGTGAaSTAAAATCGTGTTGAGGCCAACG 

CCCATAATGCGGGCAGTTGCCCGGCATCCyu^OSCCavTTCATGGCCATATCAATGATO 

TGGTGCGTACCGGGTTGAGAAGCX3GTGTAAGTGAACTGCAGTTGCCATCTTTTAC 

TGAGAGCAGAGATAGCGCTGATGTCCGGCGGTGCTTTTGCCGTTACX5CACCACCCCGTCA 

GTAGCTGAACAGGAGGGACAGCTGATAGAAACAGAAGCCACTGGAGCACCTCAAAAACAC 

CATCATACACTAAATCAGTAAGTTGGCAGaVTTACCGCGGAGCTGTTAAAGATACTACAG 

GGAATGATATTTTTGITAGTGCAGCAGATGGTTCACIXSACAACTAAATCTGA^ 

TAGCTGGTACAGGGATTGATGCTACAGCACrrCGaVGCAGCGGCTAAGAATAAAGCACAG^ 

ATGATAAATTCACGTTTAATGGAGTTGAATTCJ^CyVACTUVCAACTGCA^ 

GGAATGGTGTATATTCTGCAGAAATTGATGGTAAGTCAGTGACATTTACTGTGACAGATG 

CTGACAAAAAAGCITCTITGATTACGAGTGAGACAGTTTACAAAAATAG 

ATACGACAACCyu^GTTGATAACAAGGCTGCCACACTTTCCGATCTTGATCTCAATC 

CTAAGAAAACAGGAAGCACGTTAGTTGTTAACGGTGCAACTTACGATGTTAGTGCAGATG 

GTAAAAOSATAACGGAGACTGCTTCTGGTAACAATAAAGTCATGTATCTGAGCAAATC^ 

AAGGTGGTAGCCCGATTCTGGTAAACGAAGATGCAGCAAAATCGTTGCAATCTACCA 

ACCCGCTCGAAACTATCGACAAAGCATTGGCTAAAGTTGACT^TCTGCXSTTC^ 

GTGCyiGTACAAAACCGTTTCGACTCTGCTATCyiCCAACCTTGGCAACAC 

TGTCTTCTGCCCGTAGCCGTATCGAAGATGCTGACTACGCGACCGAAGTGTCTAACATGT 

CTCGTGCGCAGATCCTGCAACAAGCGGGTACCTCTGTTCTGGCGCAG 
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AACAAGAACCAGTCTGCGCTGTCGAGTTCTATCGAGCGTCTGT 

CTTCTGGCTTGCGTATTAACyVGCGCGAAGGATGACGCCGCAGGTCAGGCGATTGCTAACC 

GTTTTACTTCTAACTVTTAAAGGCCrrGACTCAGGCTGCACGTAACGCCAACGACG^ 

CTGTTGCGCAGACCACCGAAGGCGCGCTGTCCGAAATCAACAACAACTTACAGCGTGTGC 

GTGAACTGACCGTTCAGGCAACCACCGGTACCAACTCCCAGTCTGACCTGGACTCTATCC 

AGGACGAAATTAAATCCCGTCTGGACGAAATTGACCGCGTATCCGGTCAGACCCAGTTCA 

ACGGCGTGAACGTACTGGCAAAAGACGGTTCCATGAAAATTCAGGTTGGCGCGA^ 

GCCAGACCATO^CTATCGACCTGAAGAAGATTGACTCTTCTACGCTGAAACTC 

TTAACGTGAATGGCAAAGCAGCGGTTGATAATGCTAAAGCGACGGATGCAAATCTGACTA 

CCGCCGGTTTTACACTAGGCGTTGTGGATTCAAATGGTAATAGTACTTGGACTAAATCA^ 

CTACGACTAATTTCGATGCGGCAACTGCAGTAAACGTACTAGCAGCAGTTAAAGATGGCA 

GCACAATCAATTACACCGGTACTGGTAATGGTTTAGGGATTGCTGCAAC^ 

CATATCACGATAGCACTAAATCCTATACCTTTGATTCTACGGGGGCrGCAGTAGCTGGTC 

CCGCGTCa^GCCTGCAAGGTACTTTTGGTACAGATACGAATACTGCAAAAATCACCATCG 

ATGGTTCTGCTCAAGAAGTAAACATCGCTAAAGATGGGAAAATTACTGATACTQATTC 

AAGCTTTATATATCGATTCCACTGGTAATTTGACTAAGAACGGCTCTGATACTCT 

AGGCAACATTGAATGATGTCCTTACKKSTGCTAATTCAGTTGAT^ 

TCGATAGCGGCTlTGTCTGTCACCCrTGATAAAGTGAACAGCACTGTAGATATCACT^ 

CATCTATTTCAGCCGCroCAATGACTAATGAGTTGACAGGTAAGGCCTATA^^^ 

ATGGTGCAGAATCTTACGCTGTAGCTACTAATAACACAGTAAAAAOSACTGCTGATC 

AAAATGTTTATGTTGATGCTAGTGGTAAATTAACTACTGATGACAAAGCCACT 

AAACTTATCATGAATTTGCGAATGGOUVTATCTATGATGATAAAGGCGCTGCTGTTT 

CGGCGGCGGATGGTTCTCTGACTACyiGAAACTACAAGTAAATCAGAAGCTACAGCTAACC 

CGCTGGCCGCTCTGGACGACGCAATCAGCCAGATCGACAAATTCCGTTCATCCCr^ 

CTATCCAGAACCGTCTGGATTCCGCAGTCACCAACCTGAACAACACCACrACCAATCT^ 

CTGAAGCGCAGTCCCGTATTCAGGACGCCGACTATGC6ACCGAAGTGTCCAATATGTCGA 

AAGCGCAGATCATCCAGCAGGCAGGCAACTCCGT6CTGGCAAAA 
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AACAAAAACCAGTCTGCGCTGTCGACTTCTATCGAGCGCCTCTC 

TTCTGGTCTGCGCATTAACAGCGCTAAAGATGACGCTGCGGGCCAGGCGATTGCTAACCG 

CTTCACTTCTAACATCAAAGGTCTGACTCAGGCCGCACGTAACGCCAACGACGGTATCTC 

TCTGGCGCAGACCACTGAAGGCGCACTGTCTGAAATCAACAACAACrrTGCyVGCGTG 

TGAACTGACCGTTCAGGCCACTACCGGTACTAACTCTGATTCTGACCTGTCTTCAATCCA 

GGACGAAATCAAATCCCGTCTCGATG7VAATTGACCGCGTATCCGGTCAGACTCAGTTCAA 

CX3GCGTGAACGTACTGGCAAAAGATGGCTCGATGAAAATTCAGGTCGGTGCAAATGATGG 

TCAGACAATCAGCATTGATTTGCAGAAGATTGATTCTTCTACTTTAGGGTTAAATTC 

TTCroTTTCCAAAAATGCAGTATCTGTTGGTGATGCTATTACTCAA 

GGraGCCGATGCACCAGTAACCATCAAGTTTGATGATTCAGTAAAAACTGATTTAAAACT 

GACCGATGCTTCAGGGTTAAGTCTGCATAACCTCAAAGATGAAAATGGTAATTTA^ 

CO^GTATGTTGTACAGAATGGCGGAAAATCTTACGCTGCTACAGTCGCTGCCAATGGTAA 

TGTTACGCTGAACAAAGCAAATGTAACCTACAGCGATGTCGCAAACGGTATTGATACCGC 

AACGCAGTCAGGCCAGTTAGTTCAGGTTGGTGCAGATTCTACCGGTACGCCAAAAGCATT 

CGTGTCTGTCCAAGGTAAAAGCTTTGGCATTGATGACGCCGCCTTGAAGAATAACACTGG 

TGATGCTACCGCTACTOUVCCGGGAACATCrrGGGACAACAGTTGTCGCAGaSTC^ 

TCTGAGTACGGGCAAAAACTCTGTAGACGCTGATGTAACGGCTTCCACTGAATTCACATC 

TGCTTCAACCAACGATCCACTGACTCTGCTGGACAAAGCTATCGCATCTGTTGATAAATT 

CCGTTCTTCTTTGGGGGCGGTACAGAACCGTCTGAGCTCOXrrGTAACCAACC^ 

CACCACCACCAACCTGTCTGAAGCGO^GTCCCGTATTCAGGACGCCGACTATGCGACC^ 

AGTGTCCAACATGTCGAAAGCGCA6ATTATCCAGCAGGCAGGTAACTCCGTGCTGTCCAA A 
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AACAAAAACCAGTCTGCGCTGTCGACTTCTATCGAGCGCCTCTCTTCTGGTC 

TGCGCATTAACAGCGCTAAAGATGACGCTGCGGGCCAGGCGATTGCTAACCGCTTCACTT 

CTAACATCT^GGTCTGACTCAGGCTGCACGTAACGCCAATGACGGTATTTCTCTAGCAC 

AGACAGCGGAAGGCGCGCTGTCAGAGATTAACAACTUlCTTGCyiGCGTGTGCGTGAGTTC 

CCGTGCAGGCAACCACTGGTACCAACTCTGATTCCGATCTCTCTTCTATTCAGGATGAAA 

TTAAATCTCGTCTGGATGAAATTGACCGCGTCTCTGGTCAGACCCAGTTTAACGGCGTGA 

ACGTACTGGCTAAAAACGGTTCTATGGCAATTCAGGTTGGCGCGAACGATGGCCAGACTA 

TCrrCTATCGACCTGCAGAAAATAGACTCTTCTACrrCTGGGTCTGAGCGGCTTCTCTGT^ 

CTCAGAACTCCCTGAAACTGAGCGATTCrrATCACTACGATCGGCAATACTACTGCTGCAT 

CGAAGAACGTGGACCTGAGOTCAGTAGCAACTAAACTGGGCGTGAATGCAAGCACCCTGA 

GCCTGCACGAAGTTCAGGACTCTGCTGGTGACGGTACrroGTACCTTCGTTGT^ 

GCAGCGACy^CTATGCTGTGTCrcTAGACGCGGCCrrCTGGTGCAGTTAA 

CTGACGTCACCTATGATGACGCTACTAATGGTGTTACTGGCGCGACTCAGAACGGTCAGC 

TGATCAAAGTAACTTCTGACGCOUVCGGTGCAGCTGTTGGTTACGTAACCAT^ 

AAAACTATCAGGCTGGTGCGACCGGTGTTGACGTTCTGGCGAACAGCGGTGOT 

CAACTACAGCTGTTGATACCGGTACTCTGCAACTGAGCGGTACriX^ 

TGAAAGGTACTGCAACTCAGAACCCyiCTGGCACTATTGGACAAAGCTATCGCTTCTC 

ATAAATTCCGTTCTTCTCTGGGTGCGGTACAGAATCGTCTGAGCTCrGCTGTAACC^ 

TGAATAACACCACCACTAAC(nX3TCTGAAGCGCAGTCCa3TATTCAGGATGC^^ 

CGACCGAAGTGTCAAATATGTCTAAAGCGCAGATCGTTCAGCAGGCCGGTAAC 
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AACAAATCTCAGTCTTCTCTTAGCTCTGCTATTGAGCGTCTGTCTTCT 

GGTCTGCGTATTAACAGCXSCAAAAGACGATGCAGCy^GGTCAGGCGATTGCTAACCGTTTT 

ACGGCAAATATTAAAGGTCTGACCCAGGCTTCCCGTAACGCAAATGATGGTATTTCTGTT 

GCGCAGACCACTCAAGGTGCGCTGAATGAAATTAACAACAACCTGCAGCGTAT^ 

CrTTCTGTTCAGGCAACTAACGGTACTAACTCTGACAGCGATCTTTCTTCTATCCA 

GAAATTACTCAACGTCTGGAAGAAATTGACCGTGTATCTGAGCAAACTCAGTTTAACGGC 

GTGAAAGTCCTTGCTGAAAATAATGAAATGAAAATTCAGGTTGGTGCTAATGATGGTGAA 

ACCATCACTATCAATCTGGCAAAAATTGATGCGAAAACTCTCGGCCTGGACGGTTT^ 

ATCGATGGCGCGCAGAAAGCAAO^GGCAGTGACCTGATTTCTAAATT^ 

ACTGATAATTATGATGTTGGCGGTAAAACTTATACCGTGAATGTGGAGAGCGGCGCGGTT 

AAGAATGATGCTAATAAAGATGTTTTTGTAAGC6CAGCTGATGGATCGCTGACGACCAGT 

AGTGATACTAAAGTATCOSGTGAAAGTATTGATGCAACAGAACTAGCGAAACTTGCAATA 

AAATTAGCTGAO^GGCTCCATTGAATACAAGGGCATTACATTTACTAACAACACTG^ 

GOIGAGCTTGATGCTAATGGTAAAGGTGTTTTGACCGCAAATATTC 

CAATTTACTATTGACAGTAATGCACCCACGGGTGCCGGCGCAACAATAACTAaVGA^ 

GCTGTTTACAAAAACAGTGCGGGCaVGTTCACaiCTACAAAAGTGGAAAATA^ 

ACACTCrCTGATCTGGATCTTAATGCAGCa^GAAAACAGGTAGCACTTTAG^^ 

GGCGCCACCTACAAT6TCAGCGCAGATGGTAAAACGGTAACTGATACTACTCCTGGTGCC 

CCTAAAGTGATGTATCTGAGOU^TCAGAAGGTGGTAGCCCGATTCTGGTAAAC^^ 

GCAGCAAAATCGTTGCAATCTACCACCAACCCGCTCGAAACTATCGACAAGGCAT^^ 

AAAGTTGACyVATCTGCGTTCTGACCTCGGTGCAGTACAAAACCGTTTCGACT 

ACa^CCTTGGCAACACCGTAAACAACCTGTCTTCTGCCCGTAGCCGTATCGAAGATGCT 

GACTACGCGACCGAAGTGTCTAACATGTCTCGTGCGCAGATCCTGCAACAAGCGGGTACC 

TCTGTTCTGGCGCAG 



Figure 43 



S:\P30384\nG7-69 



ATGGCACAAGTCATTAATACCAACAGCCTCTCGCTGATCACT 

CAAAATAATATCAACTUVGAACCAGTCTGCGCTGTCGAGTTCTATCGAGCGTCTGTCTTCT 

GGCTTGCGTATTAACAGCGCGAAGGATGACGCCGCAGGTCAGGCGATTGCTAACCGTTTC 

^^^CTAACArrAAAGGCCTGACTCAGGCTGCACGTAACGCCAACGACGGTATTTCTGW 

GCACAGACCACCGAAGGCGCGCTGTCCGAAATCAACAACAACTTACAGCGTATCCGTG^ 

CTGACGGTTCAGGCTTCTACCGGGACTAACTCTGATTCGGATCTGGACTCCATO 

GAAATCAAATCCCGTCTGGACGAAATTGACCGCGTATCCGGCCAGACCCAGTTCAACGGC 

GTGAACGTCCTGGCGAAAGACGGTTCAAT6AAAATTCAGGTTGGTGCGAATGACGGC 

ACTATCACTATTGATCTGAAGAAAATTGACTCTGATACTCTGGGTTTGAGTGGAT^ 

GTGAATGGCAAAGGGGCTGTGGCTAACGCAAAAGCGACCGAAGCAGATTTAACGGGGGCT 

GGTTTCTCTCAAGGAGCGGTGGATAOU^ACGGAAATAGTACTTCGAC^^ 

ACCAATTACTCA6CTGCAACAACTGCTGACTTGTTATCGACCATTAA6GATGGCTCTACT 

GTTACATATGCAGGGACAGACACCGGATTAGGGGTCGCAGCAGCAGGAAATTATACTTAT 

GATGCGAACAGTAAATCTTATTCCTTCyVATGCCAATGGTCTGACGGGCGCA^ 

ACTCCACTCAAAGGTTACTTGGGGACAGGTGCTAACACCGCTAAAATTTCTATCGGTGGT 

ACAGAGCAGGAAGTGAATATTGCCAAAGATGGCACTATTACAGATACXSAATGGT^ 

CTCTATCTGGATATTACCGGCAACCTGACTAAGAACTATGCGGGTTCACCACCTGCA 

ACGCTGGATAACGTATTAGCTTCCGCAACTGTAAATGCa^CTATCAAGTTTGATAGC^ 

ATGACGGTTGAITACACTGCAGGTACTGGCGCGAATATTACAGGTGCATCCATTTC^ 

GATGACATGGCCGCAAAACTGAGCGGAAAGGCGTACACTGTTGCC^ 

TATGACGTTGCTGmGTTACGGGGGCTGTAACAACTA»GCAGGTAATTCACCTC 

GCCGATGCAGACGGTAAATTAACGACGAGTGCCAGTAATACGGTTACTCAGACTTATCAC 

GAGTTTGCTAATGGTAACATTTATGATGACAAAGGCTCGTCACTGTATAAAGCTG^ 

GGCTCTCTGACTTCTGAAGCTAAAGGGAAATCTGAAGCAACCGCCGATCCCCTGAAAGCT 

CTGGACGAAGCCATCAGCTCCATCGACAAATTCCGCTCCrCCCTCGGTGCCGTTCT^^ 

CGTCTGGATTCTGCGGTGACCMCCrGAACAA»CCACTACCAACCTGTCrrGAAGC6CAG 

TCCCGTATTCAGGACGCCGACTATGCGACCGAAGTGTCCAATATGTCGAAAGCGCAGATC 

ATCCAGCAGGCCGGTAACTCCGTGTTGGCAAAAGCTAACCAGGTACCGCAGCAGGTTCTG 

TCTCTGCTGCAGGGTTAA 
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GCGCTGTCGACTTCTATCGAGCGCCTCTCTTCTGGTTTGCGCATTAACAGCGCTA 

AAGATGACGCTGCGGGCCAGGCGATTGCTAACCGCTTCACTTCTAACATCAAAGGTCTGA 

CTCAGGCCGCACGTAACGCCAACGACGGTATCTCTCTGGCGCAGACCACTGAAGGCGCAC 

TGTCTGAAATCAACy^CAACTTGCAGCGTGTTCGTGAACTGACCGTTCAGG^ 

GTACTAACTCTGATTCTGACCTGTCTTCAATCCAGGACGAAATCAAATCCCGCITGG^ 

AAATCGATCGTGTCTCTGGTCAGACCCAGTTCAACGGCGTGAACGTGCTGGCTAAAAACG 

GTTCTCTGAATATTCAGGTTGGCGCGAATGATGGGCAGACCATCTCTATCGATTTGC^^ 

AAATAGACTCTTCTGCCCTTGGTTTAAGTGGTTTTAGTGTTGCCGGTGGGGCGCTAAAAT 

TAAGCGATACAGTGACGCAGGTCGGCGATGGTTCAGCCGOJCCAGTTAAAGTGGATC^ 

ATGCAGCAGCAACAGATATTGGTACTGCTTTGGGGO^AAAGGTTAATGC^ 

CGTTGCACAATATCTTAGACAAAGATGGTGCGGCAACTGAGAACTATGTTGTTAGC^^ 

GTAGTGATAATTACGCTGCATCTGTTGCy^TGACGGGACTGTAACTCTT 

ATATTACTTATTCAGGCGGTGATATTACCGGaSCTACCAAAGATGATACGTTG^ 

TTGCTGCTAATTCTGACGGAGAGGCCGTTGGTTTCGCTACCGTTCAGGGTAAGAATTATC 

AT^TTACAGATGGTGTAAAAAACCAGTCCACTGCIXKyvCCAACCGATATTGCT 

TTGATCTGGATACGGCI^TGAATTTACrGGGGCTTCCAC^ 

TAGACAAAGOTATTGCACAGGTTGATACTTTCCGCTCCTCCCTCGGTGCCGTTCAAAA^ 
GTCTGGATTCCGCAGTCACCAACCTGAACAACACTACTACCaUVCCTGTCTGAAGCGCAGT 
CCCGTATTCAGGAC6CCGACTATGCGACCGAAGTGTCCAATATGTCGAAAGCGCAGATCA TCCAGCAGGCC 
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ATGGCACAAGTCATTAATACCAACAGCCTCTCGCTGATCACT 

CAAAATAATATCAACAAGAACCAGTCTGCGCTGTCGAGTTCTATCGAGCGTCTGTCTTCT 

GGCTTGCGTATTAACAGCGCGAAGGATGACGCAGCGGGTCAGGCGATTGCTAACCGTTTT 

ACTTCTAATATTAAAGGCCTGACTCAGGCTGCACGTAACGCCAATGACGGTATTTCTCTG 

GCGCAGACCACTGAAGGCGCACTGTCTGAAATCAACAACAACTTGCAGCGTGTGCGTGAA 

CTGACCGTACAGGCGACAACCGGAACGAACTCCGAATCTGACCTGTCCTCTATCCAGGAC 

GAAATCAAATCCCGTCTGGAAGAGATT6ACCGCGTATCCGGCCAGACTCAGTTCAACGGC 

GTGAATGTGCTGGCAAAAGACGGCACCATGAAAATTCAGGTAGGCGCG/^CGATGGTCAG 

ACTATCTCTATCGATCTGAAAAAAATCGACTCTTCAACCCTGGGCCTGACCGGTTTTGAT 

GTTTCGACGAAAGCGAATATTTCTACGACAGCAGTAACGGGGGCGGCAACGACCACTTAT 

GCTGATAGCGCCGTTGCAATTGATATCGGAACGGATATTAGCGGTATTGCTGCTGATGCT 

GCGTTAGGAACGATCAATTTCGATAATACAACAGGCAAGTACTACGCACAGATTACCAGT 

GCGGCCAATCCGGGCCTTGATGGTGCTTATGAAATCCATGTTAATGACGCGGATGGTTCC 

TTCACTGTAGCAGCGAGTGATAAACAAGCGGGTGCTGCTCCGGGTACTGCTCTGACAAGC 

GGTAAAGTTCAGACTGCAACCACCACGCCAGGTACGGCTGTTGATGTCACTGCGGCrAAA 

ACTGCTCTGGCTGCAGCAGGTGCTGACACGAGTGGCCTGAAACTGGTTCAACTGTCCAAC 

ACGGATTCCGCAGGTAAAGTGACCAACGTGGGTTACGGCCTGCAGAATGACAGCGGCACT 

ATCTTTGCAACCGACTACGATGGCACCACTGTGACCACGCCGGGCGCAGAGACTGTGACT 

TACAAAGATGCTTCCGGTAACAGCACCACTGCGGCTGTCACACTGGGTGGCTCTGATGGC 

AAAACCAATCTGGTTACCGCCGCTGACGGCAAAACGTACGGTGCGACTGCACTGAATGGT' 

GCTGATCTGTCCGATCCTAATAACACCGTTAAATCTGTTGCAGACAACGCTAAACCGTTG 

GCTGCCCTGGATGATGCAATTGCGATGGTCGACAAATTCCGCTCCTCCCTCGGTGCGGTG 

CAAAACCGTCTGGATTCCGCT^GTCACCAACCTGAACAACACCACTACCAACCTGTCTGAA 

GCGCAGTCCCGTATTCAGGACGCCGACTATGCGACCGAAGTGTCCAACATGTCGAAAGCG 

CAGATTATCCAGCAGGCAGGTAACTCCGTGCTGTCCAAAGCTAACCAGGTTCCGCAGCAG 

GTTCTGTCTCTGCTGCAGGGTTAA 
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AACAAAAACCAGTCTGCGCTGTCGACTTCTATCGAGCGCCTCTCTTCTGGT 

CTGCGTATTAACAGCGCTAAAGATGACGCCGCGGGCCAGGCGATTGCTAACCGCTTTACT 

TCTAACATCAAAGGTCTGACTCAGGCCGCACGTAACGCCAACGACGGTATTTCTCTGGCG 

CAGACGGCTGAAGGCGCGCTGTCAGAGATTAACAACAACTTGCAGCGTATTCGTGAACTC 

ACCGTTCAGGCCrCTACCGGCACGAACTCrrGATTCCGACCTGTCTTCTATTCATC 

ATCAAATCCCGTCrroATGAAATTGACCGTGTATCTGGTCAGACCCAGTTCAACG^ 

AACGTGCTGTCGAAAAACGATTC6ATGAAGATTCAGATTGGTGCCAATGATAACCAGACG 

ATCAGCATTGGCTTGCAACAAATCGACAGTACCACTTTGAATCTGAAAGGATT^ 

TCCGGCATGGCGGATTTCAGCGCGGCGAAACTGACGGCTGCTGATGGTACAGCAATTC 

GCTGCGGATGTCAAGGATGCTGGGGGTAAACT^GTCAATTTACTGTCTTACy^CTGA^^ 

GCGTCTAACAGTACTAAATATGCGGTCGTTGATTCTGCAACCGGTAAATACATGGAAGCC 

ACTGTAGCCATTACCGGTACGGCGGCGGCGGTAACTGTTGGTGCAGCGGAAGTGGCGGGA 

GCCGCTACAGCCGATCCGTTAAAAGCACTGGATGCCGCAATCGCTAAAGTCGACAAATTC 

CGCTCCTCCCrrCGGTGCCGTTCAAAACCGTCTGGATTCTGCGGTCACCAACCTGAACAAC 

ACCACCACCAACCTCTCTGAAGCGCAGTCCCGTATTCAGGACGCCGACTATGCGACCGAA 

GTGTCCAACATGTCGAAAGCGCAGATTATCCAGCAGGCCGGTAACTCCGTGCTGGCAAA 
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ATGGCACAAGTCATTAATACCAACAGCCTCTCGCTGATCACTC 

AAAATAATATCAACAAGAACCAGTCTGCGCTGTCGAGTTCTATCGAGCGTCTGTCTTCTG 

GCTTGCGTATTAACAGCGCGAAGGATGACGCAGCGGGTCAGGCGATTGCTAACCGTTTTA 

CCTCTAACATTAAAGGTCTGACTCAGGCTGCACGTAACGCCAACGACGGTATTTCTGTTG 

CT^CAGACa^CTGAAGGCGCGCTGTCCGAAATCAACAACAACTTACAGCGTATCCGTGi^ 

TGACGGTTCAGGCTTCTACCGGGACTAACTCCGATTCGGATCTGGACTCCATTCAGGACG 

AAATCAAATCCCGTCTGGACGAAATTGACCGCGTATCCGGTCAAACCCAGTTCAACGGTG 

TGAACGTACTGGCGAAAGACGGTTCGATGAAAATTCAGGTTGGTGCGAATGACGGCCAGA 

CTATCACGATTGATCTGAAGAAAATTGACTCAGATACGCTGGGGCTGAATGGTTTCAACG 

TTAATGGCAAAGGCACTATTGCGAACAAAGCTGCTACyiGT«GCGATCrGACCGCTGCTC 

GTGCAACGGGAACAGGTCCTTATGCTGTGACCTICAAACAATACAGCACTCAGCGCTAGCG 

ATGCACTGTCTCGCCTGAAAACaSGAGATAOVGTTACTACTACTGGCTCGAGTGCTGCGA 

TCTATACTTATGATGCGGCTAAAGGGAACTTCACCACTCAAGCAACAGTTGCAGATGGCG 

ATGTTGTTAACTTTGCGAATACTCTGAAACCAGCGGCTGGOVCTACTGCA 

ATACrCGTAGTACTGGTGATGTGAAGTTTGATGTAGATGCTAATGGCGATGTGACC^^ 

GTGGTAAAGCCGCGTACCTGGACGCCACTGGTAACCTATCTACAAACAACCCCGGCATTG 

CATCTTCAGCGAAATTGTCCGATCTGTTTGCTAGOSGTAGTACCTTAGCGA 

CTATCCAGCTGTCTGGCACAACTTATAACTTTGGTGCAGCGGCAACTTCT 

ACACCAAAACTGTAAGCGCTGATACTGTACTGAGCACAGTGCAGAGTGCTGCAACG^ 

ACACAGCAGTTACTGGTCCGACAATTAAGTATAATACAGGTATTCAGTCTGCAAC^ 

CCTTCGGTGGTGTGAATACTAATGGTGCTGGTAATTCGAATGACACCTATACTGATGCAG 

ACAAAGAGCTO^Ca^CTUlCCGCATCTTACACTATCAACTACAACGTCGATAAGGATACCG 

GTACAGTAACTGTAGCTTCAAATGGCGCAGGTGCAACTGGTAAATTTGCAGCTACTC 

GGGCACAGGCTTATGTTAACTCTACAGGCAAAOTGACCACTGAAACCACCAGT^^ 

CTGCAACCAAAGATCCTCTGGCTGCCCTGGATGAAGCTATCAGCTCCATCGACAAAT^^ 

GTTCATCCCTGGGTGCTATCCAGAACCGTCTGGATTCCGCGGTTACa^CCTGAACAAC^ 

CCACTACCAACCTGTCCGAAGCGCAGTCCCGTATTCAGGACGCCGACTATGCGACCGAAG 

TGTCa^CATGTOSAAAGCGCAGATTATCCAGCAGGCCGGTAACTCCGTGCTGGCAA 

CCAACCAGGTACCGCAGCAGGTTCTGTCTCTGCTGCAGGGTTAA 
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ATGGCACAAGTCATTAATACCAACAGCCTCTCGCTGATCAC 

TCAAAATAATATCAACAAGAACCAGTCTGCGCTGTCGAGTTCTATCGAGCGTCTGTCTTC 

TGGCTTGCGTATTAACAGCGCGAAGGATGACGCCGCAGGTCAGGCGATTGCTAACCGTTT 

TACTTCTAATATTAAAGGCCTGACTCAGGCTGCACGTAAaSCCAATGACGGTATTTCTGT 

TGCAOVGACCACTOAAGGCGCGCTGTCCGAAATCAACAACAACTTAC^^ 

ACTGACCGTTCAGGC6ACCACCGGTACCAACTCCCAGTCTGATCTGGACTCTATCCAGGA 

CGAAATCAAATCCCGTCTGGACGAAATTGACCGCGTATCCGGTCAGACTCAGTTCAACGG 

CGTGAACGTACTGGCAAAAGACGGTTCCATGAAAATTCTIGGTTGGCGCGAATGATGGCCA 

GACa^TCACTATCGACCTGAAGAAGATTGACrCTTCTAa3TTGAAACIX3AC^ 

CGTGAATGGTTCTGGTTCTGTGGCGAATACTGCGGCGACTAAAGACGAACTGGCTGCTGC 

TGCTGCGGCGGCGGGTACAACTCCrcCTGTCGGTACTGACGGCGTGACCAAATATACCGT 

AGACGCAGGGCTTAACAAAGCCACAGCAGCAAACGTGTTTGCAAACCTTGCAGATGGTGC 

TGTTGTTGATGCTAGCATTTCCAACGGTTTTGGTGOVGCAGCJVGCCACAGACTACACCT 

CAATAAAGCTACAAATGATTTCACTTTCAATGCCAGCATTGCTGCTGGTGCTG 

TGATAGTAACAGOKZAGCTCTGCAATCCTTCCTGACTCCAAAAGCAGGTGATAC^ 

CCTGAGCGTCAAAATCGGTACGACATCTGTTAATGTTGTTCTGGCGA 

TACAGCGAAAGATGGCTCAGCTCTGTATATCGACTCAACGGGTAACCTGACTCAGAACAG 

CGCAGGCACTGTAAOWSCAGCAACCCTGGATGGACTGACCAAAAACCATGATGCGA^ 

AGCTGTTGGTGTTGATATCACGACCGCAGATGGCGCAACTATCTCTCTGGCAGGCTCTO 

TAACGCGGCAACAGGTACTCAATCAGGTGCAATTACACTGAAAAATGTTCGTATCAGTGC 

TGATGCTCTGCAGTCTGCTGCGAAAGGTACnSTTATCAATGTTGATAATGGTGCTGATGA 

TATTTCTGTTAGTAAAACCGGGTGTCGTTACTACCGGAGGTGCGCCTACTTATACTGATG 

CTGATGGTAAATTAACGACAACCAACACCGTTGATTATTTCCTGCAAACTGATG^ 

TAACCAATGGTTCTGGTAAAGGGGTTTACACCGATGCAGCTGGTAAATTCACTACCGACG 

CTGCAACCAAAGCCGCAACCACCACCGATCCGCTGAAAGCCCTTGAT^ 

AGATCGATAAGTTCCGTTCATCCCTGGGTGCTATCCAGAACCGTCTGGATTCCGCGGTTA 

CCAACCTGAACAACACCACTACCAACCrGTCOSAAGCGCAGTCCCGTATTCAGGACGCCG 

ACTATGCGACCGAAGTGTCCAATATGTCGAAAGCGCAGATCATCCAGCAGGCCGGTAACT 

CCGTGTTGGCAAAAGCTAACCy^GGTACCGCAGCAGGTTCTGTCTCTGCTGCAGGGTTAA 
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AACAAATCTCAGTCTTCTCTTAGCTCTGCTATTGAGCGTCTGTCTTCTGGT 

CTGCGTATTAACAGCGCAAAAGACGATGCAGCAGGTCAGGCGATTGCTAACCGTTTTACG 

GCAAATATTAAAGGTCTGACCCAGGCTTCCCGTAACGCGAATGATGGTATTTCTGTTGCG 

CAGACCACTGAAGGTGCGCTGAATGAAATTAACAACAACCTGCAGCGTATTCGTGAACTT 

TCTGTTCAGGCyUlCTAACGGTACTAACTCTGACAGCGATCTTTCTTCTATCCAGGCTGAA 

ATTACTCAACGTCTGGAAGAAATTGACCGTGTATCTGAGCAAACTCAGTTTAACGGCGTG 

AAAGTCCTTGCTGAAAATAATGAAATGAAAATTCAGGTTGGTGCTAATGATGGTGAAACC 

ATCACTATCAATCTGGCAAAAATTGATGCGAAAACTCTCGGCCTGGACGGTTTTAATATC 

GATGGCGCGCAGAAAGCAACCGGCyVGTGACCTGATTTCTAAATTTAAAGCGACAGGTACT 

GATAATTATCAAATTAACGGTACTGATAACTATACTGTTAATGTAGATAGTGGAGTAGTA 

CAGGATAAAGATGGCAAACAAGTTTATGTGAGTGCTGCGGATGGTTCACI^ACGACCAG 

AGTGATACTCAATTCTVAGATTGATGCAACTAAGCTTGCAGTGGCrGCTAAAGATT^ 

CAAGGTAATAAGATTGTCTACGAAGGTATCGAATTTACAAATACCGGCACTGGCGCTATA 

CCTGCCACAGGTAATGGTGAATTAACCGCCAATGTTGATGGTAAGGCTGTTGAATTCACT 

ATTTCGGGGAGTGCTGATACATCAGGTACTAGTGCAACCGTTGCCCCTACGACAGCCCTA 

TACAAAAATAGTGCAGGGCAATTGACTGCAACAAAAGTTGAAAATAAAGCAGCGACACTA 

TCTGATCTTGATCTGAACGCTGCCAAGAAAACAGGAAGCACGTTAGTTGTTAACGGTGCA 

ACTTACGATGTTAGTGCAGATGGTAAAACGATAACGGAGACTGCTTCTGGTAACAATAAA 

GTCATGTATCTGAGCAAATCAGAAGGTGGTAGCCCGATTCTGGTAAACGAAGATGCAGCA 

AAATCGTTGCAATCTACCACCAACCCGCTCGAAACTATCGACAAAGCATTGGCTAAAGTT 

GACy^TCTGCGTTCTGACCTCGGTGCAGTACAAAACCGTTTCGACTCTGCCATCACCAAC 

CTTGGCAACACCGTAAACAACCrGTCTTCTGCCCGTAGCCGTATCGAAGATGCTGACTAC 

GCGACCGAAGTGTCTAACATGTCTCGTGCGCAGATCCTGCAACAAGCGGGTACCTCTGTT CTGGCACAG 
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ATGGCACAAGTCATTAATACCAACAGCCTCTCGCTGATCAC 

TCAAAATAATATCAACAAGAACCAGTCTGCGCTGTCGAGTTCTATCGAGCGTCTGTCTTC 

TGGCITGCGTATTAACAGCGCGAAGGATGACGCAGCGGGTCAGGCGATTGCTAACCGTTT 

CACCTCTAACATTAAAGGCCTGACTCAGGCGGCCCGTAACGCCAACGACGGTATCTCCGT 

TGCGCAGACCACCGAAGGCGCGCTGTCCX3AAATCAACAACAACTTACAGCGTGTGCGTGA 

ACTGACGGTACAGGCa^CTACCGGTACTAACTCTGAGTCTGATCTGTCTTCTATCCAGGA 

CGAAATTAAATCCCGTCTGGATGAAATTGACCGCGTATCTGGTCAGACCCAGTTCAACGG 

CGTGAACGTGCTGGCAAAAAATGGCTCCATGAAAATCCT^GGTTGGCGCAAATGATAACC^ 

GACTATCa^CTATCGATCTGAAGCAGATTGATGCTAAAACTCTTGGCCrTGATC 

CGTTAAAAATAACGATACAGTTACCACTAGTGCTCOVGTAACTGCTTTTGGTGCTACCAC 

CACAAACAATATTAAACTTACTGGAATTACCCTTTCTACGGAAGCAGCCACTGATAC^^ 

CGGAACTAACCCAGCTTCJATTGAGGGTGTTTATACTGATAATGGTAATGATTACTATGC 

GAAAATCACCGGTGGTGATAACGATGGGAAGTATTACGCAGTAACAGTTGCTAATGATGG 

TACAGTGACAATGGCGACTGGAGCT^CGGCAAATGCAACTGTAACTGATGCT^ 

TAAAGCTACAACTATCACTTCAGGCGGTACACCPGTTCAGATTGATAAT^^ 

CGCAACTGCCAACCTTGGTGCTGTTAGCTTAGTAAAACTGCAGGATT^ 

TACCGATACATATGCGCTTAAAGATACAAATGGCAATCTTTACGCrGCGGATC^ 

AACTACTGGTGCTGTTTCrrGTTAAAACTATTACCTATACTGACTCTTCCGGTGCCGCC^ 

TTCTCCAACCGCGGTCAAACTGGGCGGAGATGATGGCAAAACAGAAGTGGTCGATATTGA 

TGGTAAAACATACGATTCTGCCGATTTAAATGGCGGTAATCTGCJ^CAGGT^ 

TGGTGGTGAGGCTCTGACTGCTGTTGCAAATGGTAAAACCACGGATCCGCTGAAAGCGCT 

GGACGATGCTATCGCATCTGTAGACAAATTCCGTTCTTCCCTCGGTGCGGTGCAAAACCG 

TCTGGATTCOSCGGTTACCAACCTGAACAACaVCCACTACCAACCTGTCTGAAGCGC^ 

CCGTATTCAGGACGCCGACTATGCGACCGAAGTGTCCAATATGTCGAAAGCGCAGATCAT 

CCAGCAGGCCGGTAACTCCGTGTTGGCAAAAGCTAACCAGGTACCGCAGCAGGTTCTC 

TCTGCTGCAGGGTTAA 
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ATGGCACAAGTCATTAATACCAACAGCCTCTCGCTGATCACT 

CAAAATAATATCAACAAGAACCAGTCTGCGCTGTCGAGTTCTATCGAGCGTCTGTCTTCT 

GGCITCCGTATTAACAGCGCGAAGGATGACGCCGCAGGTCAGGCGATTGCTAACCGTTTT 

ACTTCTAACATTAAAGGCCTGACTCAGGCTGCACGTAACGCa^CGACGGTATTO 

GCGCAGACCACCGAAGGCGCGCTGTCTGAAATCAACAACT^CTTACAGCGTATTCGTGAA 

CTGACGGTTCAGGCTTCTACCGGGACTAACTCTGATTCGGATCTGGACTCCATTCAGGAC 

GAAATCAAATCCCGTCTGGACGAAATTGACC6CGTATCCGGTCAAACCCAGTTCAACGGT 

GTGAACGTACTGGCGAAAGACGGTTCGATGAAAATTCAGGTTGGTGCGAATGACGGCCAG 

ACTATCACTATTGATCTGAAGAAAATTGACTCTGATACGCTGGGGCTGAATGGT^^ 

GTTAACGGCAAAGGTACTATTGCGAACAAAGCGGCAACCATTAGTGATCro^ 

GGGGCGAATGTTACTAACTCAAGCAATATTGTTGTCyVCGACAAAGTTCy^TC^ 

GCAGCGACTGCATTTAGCAAACTCyUUlGATGGTGATTCTGTTGCCXSTTGCTGCTCAGAA^ 

TATACTTATAACGCATCGACCAATGATTTTACGACy^GAAAATACAGTAGCGACAGGCACT 

GCAACGACAGATCTTGGCGCTACTCTGAAGGCTGCTGCTGGGCAGAGTCAAT^^ 

TATACCTTTGCAAATGGTAAAGTTAACTTTGATGTTGATGCAAGCGG 

GGCGGCGAAAAGGCTTTCTTGGTTGGTGGAGCGCTGACTACTAAC^ 

ACTCCAGCAACGATGTCTTCCCTCTTTAAGGCCGOSGATGACAAAGATGCaSCT 

TCGATTGATTTTGGCGGGAAAAAATACGAATTTGCrrGGTGGCAATTCT 

GGCGTTAAATTCAAAGACACGGTGTCTTCTGACGCGCTTTTGGCTCAGGTTAAAGC^ 

AGTACTGCTAATAATGTAAAAATCACCTTTAAO^TGGTCCTCTGTCATTCAC^ 

TTCCAAAATGGTGTATCnKSGCTCCGCGGCATCGAATGCAGCCTACATTGATAG^ 

GAACTGACAACTACTGAATCCTACAACACAAATTATTCCXSTAGACAAAGAavaSGGOT 

GTAAGTGTTACAGGGGGGAGCGGTACGGGTAAATACGCCGCAAACGTGGGTGCTCAGGCT 

TATGTAGGTGCAGATGGTAAATTAACCACGAATACTACTAGTACCGGCTCTGCAACCAAA 

GATCCACTAAATGCGCTGGATGAGGCAATTGCATCCATCGACAAATTCCGTTCTTCCCTG 

GGGGCTATCCAGAACCGTCTGGATTCCGCAGTCACa^CCTGAACAACACavCTACC^ 

CTGTCTGAAGCGCAGTCCCGTATTCAGGACGCCGACTATGCGACCGAAGTGTCCAACATG 

TCGAAAGCGCAGATCATCCAGCAGGCCGGTAACTCCGTGTTGGCT^AAAGCTAACCAGGTA 

CCGCAGCAGGTTCTGTCTCTGCTGCAGGGTTAA 
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AACAAGAACCAGTCTGCGCTGTCGAGTTCTATCGAGCGTCTGTC 

TTCTGGCTTGCGTATTAACAGCGCGAAGGATGACGCCGCGGGTCAGGCGATTGCTAACCG 

TTTTACTTCTAACATTAAAGGCCTGACTCAGGCTGCACGTAACGCCAACGACGGTATT^ 

TGTTGCGCAGACCACCGAAGGCGCGCTGTCCGAAATTAACAACAACTTACAGre 

TGAGCTGACTGTTCAGGCGACCACCGGTACTAACTCTGAGTCTGACCTGTCTTCTATCCA 

GGACGAAATCAAATCTCGCCTGGAAGAGATTGATCGTGTTTCAAGTCAGACTCAATTTAA 

CGGCGTGAATGTTTTGGCTAAAGATGGGAAAATGAACATTCAGGTTGGGGCAAGTGATGG 

ACAGACTATCACTATTGATCTGAAAAAGATCGATTCATCTAOVCTAAACCTCTCCAGTTT 

TGATGCTACAAACTTGGGCACCAGTGTTAAAGATGGGGCa^CCATCAATAAGCAAGTG^ 

AGTAGATGCTGGCGACTTTAAAGATAAAGCTTCAGGATCGTTAGGTACCCTAAAATTAGT 

TGAGAAAGACGGTAAGTACTATGTAAATGACACTAAAAGTAGTAAGTACTACGATGCCGA 

AGTAGATACTAGTAAGGGTGAAATTAACTTCAACTCTACAAATGAAAGTGGAACTACTCC 

TACTGCAGCGACGGAAGTAACTACTGTTGGCCGCGATGTAAAATTGGATGCTTCTGCACT 

TAAAGCCAACCAATCGCTTGTOSTGTATAAAGATAATIAGCGGCAATGATGCTTATATCAT 

TCAGACCAAAGATGTAACAACTAATCAATCAACTTTCAATGCaSCT 

TGGTGTTTTATCTATTGGTGCATCTACAACCGCGCCAAGCAATTTAACAG 

TAAGGCTCTTGATGATGCAATTGCATCTGTTGATAAATTCCGCTC^ 

TCAGAACCGTCTGGATTCTGCCATTGCCAACCTGAACyVACACCACTACCAACC 

AGCGCAGTCCCGTATTCAGGACGCTGACTATGOSACCGAAGTGTCCAACATGTCGAAAGC 

GCAGATTATCCAGCAGGCCGGTAACTCCGTGCTGGCAAAA 
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ATGGCACAAGTCATTAATACCAACAGCCTCTCGCTGATCACTCAAAA 

TAATATCAACAAGAACCAGTCTGCGCTGTCGAGTTCTATCGAGCGTCTGTCTTCTGGCTT 

GCGTATTAACAGCGCGAAGGATGACGCAGCGGGTCAGGCGATTGCTAACCGTTTCACCTC 

TAACATTAAAGGCCTGACTCAGGCTGCACGTAACGCTAACGATGGTATCTCTCTGGCGCA 

GACCACTGAAGGCGCACTGTCnSAGATTAACAACAACTTACT^CGTC 

TGTACAGGCGACCACCGGTACTAACTCTGATTCTGACCTGGCTTCTATTCAGGACGAAAT 

CAAATCCCGTTTGTCTGAAATTGACCGCGTATCCGGGCAGACCCAGTTCAACGGCGTGAA 

CGTATTGTCTAAAGATGGCTCCCTGAAAATTCAGGTTGGCGCAAATGATGGTCyVGACTAT 

CTCTATCGACCTGAAGAAAATTGACTCTGATACTCTGGGTTTGAATGGTTTC^ 

TGGTTCTGGTACCATTGCAAACAAAGCGGCCACAATCAGTGACrrTGACTGCrr^^ 

CGTTGACAACGGTAATGGTACTTATAAAGTTACAACTAGCAACGCTGCACTTAC^ 

TCAGGCATTAAGTAAGCTGAGTGATGGCGATACTGTAGATATTGCAACCTATGCTGGTGG 

TACAAGTTCAACAGTTAGTTATAAATACGACGCAGATGCAGGTAACTTCAGTTATAACAA 

TACTGCAAACAAAACAAGTGCTGCGGCTGGAACTCTGGCAGATACTCTTCTCCC^ 

TGGCCAGACTAAAACCGGTACTTACyVAGGCTGCTACTGGTGATGTTAACTTO 

CGCAACTGGTAATCTGACAATTGGCGGAO^GCAAGCCrACCT^^ 

TACAACAAACAACTCCGGTGGTGCGGCTACTGCAACTCrrrAAAGAG 

TGGCGATGGTAAATCTCTGGGGAACGGCGGTACTGCTACCGTTACTCTGGATAATACTAC 

GTATAATTTCAAAGCTGCTGCGAACGTTACTGATGGTGCTGGTGT«TCGCTGCTGOT 

TGTAACTTATACAGCCACTGTTTCTAAAGATGTCATTCraGCACAACTGC^ 

TCAGGCAGCAGCAACCGCTACCGACGGTGATACTGTCGCAACGATCAACTATAAATCTGG 

TGTCATGATCGGTTCCGCTACCTTTACCAATGGTAAAG6TACTGCCGATGGTATGACTTC 

TGGTACT^CTCCAGTCGTAGCTACAGGTGCTAAAGCTGTATATGTTGATGGCAACAATGA 

ACTGACTTCCACTGCATCTTACGATACGACITACTCTGTCAACGCAGATACAGGCGCAGT 

AAAAGT6GTATCAGGTACTGGTACTGGTAAATTTGAAGCTGTTGCTGGTGCGGATGCTTA 

TGTAAGCAAAGATGGCAAATTAACGACAGAAACCACCAGTGCAGGCACTGCAACC^^ 

TCCTTTGGCTGCCCTGGATGCTGCTATCAGCTCCATCGACAAATTCCGTTCCTCCC 

TGCTATCCAGAACCGTCTGGATTCCGCAGTCACCAACCTGAACAACACCACTACTAACCT 

GTCTGAAGCGCAGTCCCGTATTCAGGACGCCGACTATGCGACCGAAGTGTCCAATATGTC 

GAAAGCGCAGATCATCCAGCAGGCCGGTAACTCTGTGTTGGCAAAAGCTAACCAGGTACC 

GCAGCAGGTTCTGTCTCTGCTGCAGGGTTAA 
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ATGGCACAAGTCATTAATACCAACAGCC 

TCTCGCTGATCACTCT^AAATAATATCAACAAGAACCAGTCTCOSCTGTCGAGTTCTATCG 

AGCGTCTGTCTTCTGGCTTGCGTATTAACAGCGCGAAGGATGACGCCGCAGGTCAGGCGA 

TTGCTAACCGTTTTACTTCTAACATTAAAGGCCTGACTCAGGCTGCACGTAACGCCAACG 

ACGGTATTTCIX3TTGCAC7^GACCACTGAAGGCGa3CTGTCCGAAATCAA(^ 

AGCGTATTCGTGAACTGACGGTTCAGGCrTCTACCGGGACTAACTCTGATTCGGATCTGG 

ACTCCATTCAGGACGAAATCy^TCCCGTCTCGACGAAATTGACCGCGTTTCCGGTCAGA 

CCCAGTTOVACGGCGTGAACGTGCTGGCGAAAGACGGTTCGATGAAGATTCAGGTTGGCG 

CGAATGACGGGCAGACCATCTCTATCGATTTGCAGAAAATTGATTCTTCAACGCTGGGAT 

TGAAAGGTTTCTCGGTATCAGGGAACGCATTAAAAGTTAGCGATGCGATAACTACAGTTC 

CTGGTGCTAATGCTGGCGATGCCCaSGTTACGGTTAAATTTGGTGCGAACGATACCGCTG 

CTGCCGCAATGGCTAAAACATTGGGAATAAGTGATACATCAGGCTTGTCCCTACATAACG 

TACAAAGCGCGGATGGTAAAGCGACAGGAACCTATGTTGTTCAATCTGGTAATGACTTCT 

ATTCGGCTTCCGTTAATGCTGGTGGCGTTGTTACGCTTAATACCACCAATGTTACT 

CTGATCCTGCGAACGGTGTTACCACAGCAACACAGACAGGTaVGCCTATCAAGGTCACGA 

CGAATAGTGCTGGCGCGGCTGTTGGCTAT6TTACTATTCAAGGCAAAGATTACCTTGCTG 

GTGCAGACGGTAAGGATGOIATTGAAAACGGTGGTGACGCTGCAACAAAT^^ 

AAATCCAACTTACCGATGAACTOGATGTTGATGGTTCTGTAAAAACAGCGGCAACA^ 

CATTTTCTGGTACTGCAACCAACGATCCGCTGGCACTTTTAGACAAAGCTATCT^^ 

TTGATACTTTCCGCTCCTCCCTCGGTGCCGTACAAAACCGTCTGGATTCTGCXSGTCACCA 

ACCTGAATAACACCACCACCAACCnX3TCIX3AAGCGCAGTCCCGTATTCAGGACGCCG^^ 

ATGCGACCGAAGTGTCCAACATGTCGAAAGCGCAGATCATCCAGCAGGCGGGTAACTCTG 

TGCTGTCTAAAGCTAACCAGGTACCGCTlGCAGGTTCTGTCTCTGCrrGCAGGGTTAA 
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CTTCTCTTAGCTCTGCTATTGAGCGTCTGTCTTCTGGTCTGCGTATTAACAGCGCAAAAG 

ACGATGCAGCAGGTCAGGCGATTGCTAACCGTTTTACGGCAAATATTAAAGGTCTGACCC 

AGGCTTCCCGTAACGCGAATGATGGTATTTCTGTTGCGCAGACCJVCTGAAGGTGCGCTGA 

ATGAAATTAACAACAACCTGCyiGCGTATTCGTGAACTTTCTGTTCAGGCAA 

CTAACTCTGACTVGCGATCTTTCTTCTATCCTlGGCTGAAATTACTa^CGTCTGGAAGAAA 

TTGACCGTGTATCTGAGCAAACTCAGTTTAACGGCGTGAAAGTCCTTGCTGAAAA 

AAATGAAAATTCAGGTTGGTGCTAATGATGGTGAAACCATT6ACCTGCCCCCACGATTAG 

ATACy^CACTCAGTTAGTAACGTCGGAATCTTCT^TTCTCAGAATGACCC^ 

CGCTGCAAATTCAGACGGTGTCTGATAATTCAGCGTGGAGTGCGGGCGGCATTCGTTATA 

ATCCTGCCGCCAGTCATTAATAATTTTCCTGGCATGAACGATATCGCTGAACCAGTGCTC 

ATTCAAACATTCATCGCGAAATCGTCCGTTAAAGCTCrCAATAAATCCGTTCTGCGTTG^ 

CTTGCCCGGCTGGATTAAGCGCAACTCAACACCATGCTCAAAGGCCCATTGATCCAGT^ 

ACGGCAAGTGAACTCCGGCCCCTGGTCAGTTCTTATCGTCGCCGGATAGCCTCGAAACAG 

TGCAATGCTGTCCAGAATACGCGTGACCTGAACGCCTGAAATCCCAAAGGCAACAGTGAC 

CGTCAGGCATTCCTTTGTGAAATCATCGACGCAGGTAAGACACTTGATCCTGCGACCGGT 

GGA7VAGTGCGTCCATGACGAAATCCATCGACCAGGTCAGATTGGGCGCCGCCGGACGGAG 

CAGCGGCAGACGTTCrGTTGCCAGCCCTTTACGACGTCTTCTGCGTTTTACGCCCAGGC 

ACTGAGGTGATAAAGCCGGTACACGCGCTTATGATTAACATGAAGCCCITC^ 

CAACTGCCAAATACGACGGTAGCCAAAACGCCTGCGCTCCAGTGCCAGCTCaVGTGATG 

CCCTGATAAATGCGCATCAGCAGCCGGACGGTGAGCCTCATAGCGGCAGGTaSACAGOT 

TAAACCTGTAA6CCTGCAGGCACGACGTTGCGACAGACCGGTCGCATCACACATCAACAT 

CACGGCTTCCCGCTTCTGGTCTGTCGTCAGTACTTTCGCCCAAGAGCCACC^^ 

TCTTTATCCAGCATGGCTTCGGCAAGCTIGCTTCTTGAGTCTGGTGTTCTCTT 

GACTTCAGGCGCTTAACTTCAGGCACCTCCATACCGCCATACTTCTTACGCC^ 

aacgtggcatcggaaatggoitgcttgcggcagagttcacgggcgggtaccccagcto 

gcttcgcggagaataotgatgatctgttcgtcggaaaaacgcttcttcat^^ 

tcatgtggcttatgaagacattactaacatcggggtgtactaatcaacggggagcaggtc 

accatcactatcaatctggcaaaaattgatgcgaaaactctcggcctggacggttttaat 

atcgatggcgcgcagt^gcy^ccggcagtgacctgatttctaaatttaaagcga 

actgataattatcaaattaacggtactgataactatactgttaatgtagatagtggagta 

gtacaggataaagatggcaaacaagtttatgtgagtgctgcggatggttcacttacgacc 

agcagtgatactcaattcaagattgatgcaactaagcttgcagl^ctgcra^ 

gcrrcaaggtaataagattgtctacgaaggtatcgaatttacaaataccggcact^ 

atacctgcavcaggtaatggtaaattaaccgccaatgttgatggtaaggct 

actatttcggggagtgctgatacatcaggtacragtgcaaccgttgcccctacgacatc 

ctatacaaaaatagtgcagggcaattgactgcaacaaaagttgaaaataaagc^ 

ctatctgatcttgatctgaacgctgccaagaaaacaggaagcacgttagttgttaactc 

ggaacotacgatgttagtgcagatggtaaaacgataacggagactgcttctggt;^ 

aaagtovtgtatctgagcaaatcy^gaaggtggtagcccgattctggtaaacgaagatg^ 

gct^aaatcgttgcyvatctaccaccaacccgctcgaaactatcgac^^ 

gttgaovatctgcgttctgacctcggtgcagtacyvaaaccgtct 

aaccttggcaacacostaaacaaccrrgtcttctgcccgtagccgtatc^ 

tacgcgaccgaagtgtctaa»tgtctcgtga5cagatcctgcaacaagcgggtacctct 

gttctggcacaggctaacc 



Figure 56 



S:\P30384\FIG7-69 



AACAAAAACCAGTCTGCGCTGTCGACTTCTATCGAGCGCCTCTCT 

TCTGGTCTGCGCATTAACAGCGCTAAAGATGACGCTGCGGGCCAGGCGATTGCTAACCGC 

TTCACTTCTAACATCAAAGGTCTGACTCAGGCCGCACGTAACGCCAACGACGGTATCTCT 

CTGGCGCAGACCACTGAAGGCGCACTGTCTGAAATCAACAACAACTTGCAGCGTGTTCGT 

GAACTGACCGTTCAGGCCACTACCGGTACTAACTCTGATTCTGACCTGTCTTCAATCCAG 

GACGAAATCAAATCCCGTCTCGATGAAATTGACCGCGTATCCGGTCAGACTCAGTTCAAC 

GGCGTGAACGTACTGGCAAAAGATGGCTCGATGAAAATTCAGGTCGGTGCAAATGATGGT 

CAGACAATCAGCATTGATTTGCAGAAGATTGATTCTTCTACTTTAGGGl^ 

TCTGTTTCCAAAAATGCAGTATCTGTTGGTGATGCTATTACTCAATTGCCI^ 

GCAGCCGATGCACCAGTAACCATCAAGTTTGATGATTOVGTT^AAAACTGATTT^^ 

ACCGATCCTTCAGGGTTAAGTCTGCATAACCTCAAAGATGAAAATGGTAATTTAACT^ 

CAGTATGTTGTACAGAATGGCGGAAAATCTTACGCTGCTACAGTCGCTGCCAATGGTAAT 

GTTACGCTGAACAAAGCAAATGTAACCTACAGCGATGTCGCAAACGGTATTGATACCGCA 

ACGCyiGTCAGGCCAGTTAGTTCAGGTTGGTGCAGATTCTACCGGTACGCCAAAAGCATTC 

GTGTCTGTCCAAGGTAAAAGCTTTGGCATTGATCACGCCGCCTTGAAGAATAACACTGGT 

GATGCTACCGCTACTCCACCGGGAACATCTGGGACAACAGTTGTCGCAGCXSTCA^ 

CTGAGTACGGGCAAAAACTCTGTAGACGCTGATGTAACGGCITCCACTGAACT 

GCTTCAACCAACGATCCyVCTGACTCTGCTGGACAAAGCTATCGaVTCTGTTGATA^ 

CGTTCTTCTTTGGGGGCGGTACAGAACCGTCTGAGCrCCGCTGTAACCAACCTGAAC;^ 

ACCACCACC7VACCTGTCTGAAGCGCAGTCCCGTATTCAGGACGCCGACT 

GTGTCCAACATGTCGAAAGCGCAGATTATCCAGCAGGCAGGTAACTCCGTGCTGTCCAAA 
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AACAAAAACCAGTCTGCGCTGTCGACTTCTATCGAACGCCTCTCTTCTGG 

CCTGCGTATTAACAGTGCGAAAGATGACGCTGCCGGTCAGGCGATAGCTAACCGTTTCAC 

CTCTAACATTAAAGGCCTGACTCAGGCTGCGCGTAACGCCAAC6ACGGTATTTCTCTGGC 

GCAGACCAO^GAAGGTGCGTTGTCTGAAATCAACAACAACTTGCAACGTGTG 

GACCGTTCAGGCGACGACCGGTACTAACTCTGATTCTGACCTGTCATCTATTCAGGACGA 

AATCAAATCCCGTCTGGATGAGATTGACCGTGTTTCCGGTCAGACCCAGTTCAACGGCGT 

GAATGTACTGGOUJUVGACGGTTCGATGAAGATTCAGGTTGGCGCGAATG 

TATTAGCATTGATTTACAGAAAATTGACTCTTCTACATTAGGGTTGAATGGTl^ 

TTCTGCTCyUVTCACTTAACGTTGGTGATTCAATTACTCAAATTACAGGAGCCG 

AAAACCTGTTGGTGTTGATTTCACTGCTGTTGCGAAAGATCTGACTACTGCGACAGGTAA 

AACTGTCGATGTTTCCAGCCTGACGTTACACAACACCCTGGATGCGAAAGGGGCTGCCAC 

CGCACAGTTCGTCGTTCAATCCGGTAGTGATTTCTACTCCGCGTCCATTGACCATGCAAG 

TGGTGAAGTGACGTTGAATAAAGCCGATGTCGAATACAAAGACACCGATAATGGACTAAC 

GACTGCAGCTACrCAGAAAGATCAGCTGATTAAAGTTGCCGCTGACTCTGACGGCGCGGC 

TGCGGGATATGTAACATTCCAGGGTAAAAACTACGCTACAACGGCTCCAGCGGCGCTTAA 

TGATGACACTACGGCAACAGCCACy^GCGAACAAAGTTGTTGTTG^ 

TCCGACTGCGCAGTTCTCAGGGGCTTCTTCTGCTGATCCyVCTGG<^ 

CATTGOICAGGTTGATACTTTCCGCTCCTCCCTCGGTGCCGTTCAAAACCGTCTGGACTC 

TGCGGTAACCAACCTGAACAACACCACCACCAACCTGTCTGAAGCGCAGTCCCGTATTCA 

GGACGCCGACTATGCGACCGAAGTGTCTAACATGTCGAAAGCXXIAGATCATCCAGCAGGC 

GGGTAACTCTGTGCTGTCTAAA 
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ATGGCACAAG TCATTAATAC CAACAGCCTC TCGCTGATCA CTCAAAATAA TATCAACAAG 
AACCAGTCTG CGCTGTCGAG TTCTATCGAG CGTCTGTCTT CTGGCTTGCG TATTAACAGC 
GCGAAGGATG ACGCCGCGGG TCAGGCGATT GCTAACCGTT TTACTTCTAA CATTAAAGGC 
CTGACTCAGG CTGCACGTAA CGCCAACGAC GGTATTTCTG TTGCACAGAC CACTGAAGGC 
GCGCTGTCCG AAATCAACAA CAACTTACAG CGTATCCGTG AGCTGACGGT TCAGGCTTCT 
ACCGGGACTA ACTCTGATTC GGATCTGGAC TCCATTCAGG ACGAAATCAA ATCCCGTCTC 
GACGAAATTG ACCGCGTATC CGGTCAGACC CAGTTCAACG GCGTGAACGT ACTGGCAAAA 
GACGGTTCGA TGAAAATTCA GGTTGGTGCG AATGACGGTG AAACTATCAC TATCGACCTG 
AAGAAAATCG ATTCTGATAC TCTGGGTCTG AATGGTTTTA ACGTAAATGG TAAAGGTACT 
ATTACCAACA AAGCTGCAAC GGTAAGTGAT TTAACTTCTG CTGGCGCGAA GTTAAACAC 
CACGACAGGT CTTTATGATC TGAAAACCGA AAATACCTTG TTAACTACCG ATGCTGCATT 
CGATAAATTA GGGAATGGCG ATAAAGTCAC CGTTGGCGGC GTAGATTATA CTTACAACGC 
TAAATCTGGT GATTTTACTA CCACCAAATC TACTGCTGGT ACGGGTGTAG ACGCCGCGGG 
GCAGGCTACT GATTCAGCTA AAAAACGTGA TGCGTTAGCT GCCACCCTTC ATGCTGATGT 
GGGTAAATCT GTTAATGGTT CTTACACCAC AAAAGATGGT ACTGTTTCTT TCGAAACGGA 
TTCAGCAGGT AATATCACCA TCGGTGGAAG CCAGGCATAC GTAGACGATG CAGGCAACTT 
6ACGACTAAC AACGCTGGTA GCGCAGCTAA AGCTGATATG AAAGCGCTGC TTAAAGCCGC 
GAGCGAAGGT AGTGACGGTG CCTCTCTGAC ATTCAATGGC ACTGAATATA CTATCGCAAA 
AGCAACTCCT GCGACAACCT CTCCAGTAGC TCCGTTAATC CCTGGTGGGA TTACTTATCA 
GGCTACAGTG AGTAAAGATG TAGTATTGAG CGAAACCAAA GCGGCTGCCG CGACATCTTC 
AATTACCTTT AATTCCGGTG TACTGAGCAA AACTATTGGG TTTACCGCGG GTGAATCCAG 
T6ATGCTGCG AAGTCTTATG TGGATGATAA AGGTGGTATT ACTAACGTTG CCGACTATAC 
AGTCTCTTAC AGCGTTAACA AGGATAACGG CTCTGTGACT GTTGCCGGGT ATGCTTCAGC 
GACTGATACC AATAAAGATT ATGCTCCAGC AATTGGTACT GCTGTAAATG TGAACTCCGC 
GGGTAAAATC ACTACTGAGA CTACCAGTGC TGGTTCTGCA ACGACCAACC CGCTTGCTGC 
CCTGGACGAC GCTATCAGCT CCATCGACAA ATTCCGTTCT TCCCTGGGTG CTATCCAGAA 
CCGTCTGGAT TCCGCAGTCA CCAACCTGAA CAACACCACT ACCAACCTGT CTGAAGCGCA 
GTCCCGTATT CAGGACGCCG ACTATGCGAC CGAAGTGTCC AACATGTCGA AAGCGCAGAT 
TATCCAGCAG GCCGGTAACT CCGTGCTGGC AAAAGCCAAC CAGGTACCGC AGCAGGTTCT 
GTCTCTGCTG CAGGGTTAA 
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ATGGCACAAG TCATTAATAC CAACAGCCTC TCGCTGATCA CTCAAAATAA TATCAACAAG 
AACCAGTCTG CGCTGTCGAG TTCTATCGAG CGTCTGTCTT CTGGCTTGCG TATTAACAGC 
GCGAAGGATG ACGCCGCAGG TCAGGCGATT GCTAACCGTT TTACTTCTAA CATTAAAGGC 
CTGACTCAGG CGGCCCGTAA CGCCAACGAC GGTATTTCTG TTGCGCAGAC CACCGAAGGC 
GCGCTGTCCG AAATCAACAA CAACTTACAG CGTATTCGTG AACTGACGGT TCAGGCCACT 
ACAGGGACTA ACTCCGATTC TGACCTGGAC TCCATCCAGG ACGAAATCAA ATCTCGTCTT 
GATGAAATTG ACCGCGTATC CGGCCAGACC CAGTTCAACG GCGTGAACGT GCTGGCGAAA 
GACGGTTCAA TGAAAATTCA GGTTGGTGCG AATGACGGCG AAACCATCAC GATCGACCTG 
AAAAAAATCG ATTCTGATAC TCTGGGTCTG AATGGCTTTA ACGTAAATGG TAAAGGTACT 
ATTACCAACA AAGCTGCAAC GGTAAGTGAT TTAACTTCTG CTGGCGCGAA GTTAAACAC 
CACGACAGGT CTTTATGATC TGAAAACCGA AAATACCTTG TTAACTACCG ATGCTGCATT 
CGATAAATTA GGGAATGGCG ATAAAGTCAC AGTTGGCGGC GTAGATTATA CTTACAACGC 
TAAATCTGGT GATTTTACTA CCACTAAATC TACTGCTGGT ACGGGTGTAG ACGCCGCGGC 
GCAGGCTGCT GATTCAGCTT CAAAACGTGA TGCGTTAGCT GCCACCCTTC ATGCTGATGT 
GGGTAAATCT GTTAATGGTT CTTACACCAC AAAAGATGGT ACTGTTTCTT TCGAAACGGA 
TTCAGCAGGT AATATCACCA TCGGTGGAAG CCAGGCATAC GTAGACGATG CAGGCAACTT 
6ACGACTAAC AACGCTGGTA GCGCAGCTAA AGCTGATATG AAAGCGCTGC TCAAAGCAGC 
GAGCGAAGGT AGTGACGGTG CCTCTCTGAC ATTCAATGGC ACAGAATATA CCATCGCAAA 
AGCAACTCCT GCGACAACCA CTCCAGTAGC TCCX3TTAATC CCTGGTGGGA TTACTTATCA 
GGCTACAGTG AGTAAAGATG TAGTATTGAG CGAAACCAAA GCGGCTGCCG CGACATCTTC 
AATTACCTTT AATTCCGGTG TACTGAGCAA AACTATT6GG TTTACCGCGG GTGAATCCAG 
TGATGCTGCG AAGTCTTATG TGGATGATAA AGGTGGTATC ACTAACGTTG CCGACTATAC 
AGTCTCTTAC AGCGTTAACA AGGATAACGG CTCTGTGACT GTTGCCGGGT ATGCTTCAGC 
GACTGATACC AATAAAGATT ATGCTCCAGC AATTGGTACT GCTGTAAATG TGAACTCCGC 
GGGTAAAATC ACTACTGAGA CTACCAGTGC TGGTTCTGCA ACGACCAACC CGCTTGCTGC 
CCTGGACGAC GCAATCAGCT CCATCGACAA ATTCCGTTCT TCCCTGGGTG CTATCCAGAA 
CCGTCTGGAT TCCGCAGTCA CCAACCTGAA CAACACCACT ACCAACCTGT CCGAAGCGCA 
GTCCCGTATT CAGGACGCCG ACTATGCGAC CGAAGTGTCC AACATGTCGA AAGCGCAGAT 
CATTCAGCAG GCCGGTAACT CCGTGCTGGC AAAAGCTAAC CAGGTACCGC AGCAGGTTCT 
GTCTCTGCTG CAGGGTTAA 
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ATGGCACAAG TCATTAATAC CAACAGCCTC TCGCTGATCA CTCAAAATAA TATCAACAAG 
AACCAGTCTG CGCTGTCGAG TTCTATCGAG CGTCTGTCTT CTGGCTTGCG TATTAACAGC 
GCGAAGGATG ACGCAGCGGG TCAGGCGATT GCTAACCGTT TTACTTCTAA CATTAAAGGC 
CTGACTCAGG CTGCACGTAA CGCCAACGAC GGTATTTCTG TTGCGCAGAC CACCGAAGGC 
GCGCTGTCCG AAATCAACAA CAACTTACAG CGTATTCGTG AACTGACGGT TCAGGCCACT 
ACAGGGACTA ACTCCGATTC TGACCTGGAC TCCATCCAGG ACGAAATCAA ATCTCGTCTT 
GATGAAATTG ACCGCGTATC CGGCCAGACC CAGTTCAACG GCGTGAACGT GCTGGCGAAA 
GACGGTTCAA TGAAAATTCA GGTTGGTGCG AATGACGGCG AAACCATCAC GATCGACCTG 
AAAAAAATCG ATTCTGATAC TCTGGGTCTG AATGGCTTTA ACGTAAATGG TAAAGGTACT 
ATTACCAACA AAGCTGCAAC GGTAAGTGAT TTAACTTCTG CTGGC6CGAA GTTAAACAC 
CACGACAGGT CTTTATGATC TGAAAACCGA AAATACCTTG TTAACTACCG ATGCTGCATT 
CGATAAATTA GGGAATGGCG ATAAAGTCAC AGTTGGCGGC GTAGATTATA CTTACAACGC 
TAAATCTGGT GATTTTACTA CCACTAAATC TACTGCTGGT ACGGGTGTAA ACGCCGCGGC 
GCAGGCTGCT GATTCAGCTT CAAAACGTGA TGCGTTAGCT GCCACCCTTC ATGCTGATGT 
GGGTAAATCT GTTAATGGTT CTTACACCAC AAAAGATGGT ACTGTTTCTT TCGAAACGGA 
TTCAGCAGGT AATATCACCA TCGGTGGAAG CCAGGCATAC GTAGACGATG CAGGCAACTT 
GACGACTAAC AACGCTGGTA GCGCAGCTAA AGCTGATATG AAAGCGCTGC TCAAAGCAGC 
GAGCGAAGGT AGTGACGGTG CCTCTCTGAC ATTCAATGGC ACAGAATATA CCATCGCAAA 
AGCAACTCCT GCGACAACCA CTCCAGTAGC TCCGTTAATC CCTGGTGGGA TTACTTATCA 
GGCTACAGTG AGTAAAGATG TAGTATTGAG CGAAACCAAA GCGGCTGCCG CGACATCTTC 
AATTACCTTT AATTCCGGTG TACTGAGCAA AACTATTGGG TTTACCGCGG GTGAATCCAG 
TGATGCTGCG AAGTCTTATG TGGATGATAA AGGTGGTATC ACTAACGTTG CCGACTATAC 
AGTCTCTTAC AGCGTTAACA AGGATAACGG CTCTGTGACT GTTGCCGGGT ATGCTTCAGC 
GACTGATACC AATAAAGATT ATGCTCCAGC AATTGGCACT GCTGTAAATG TGAACTCCGC 
GGGTAAAATC ACTACTGAGA CTACCAGTGC TGGTTCTGCA ACGACCAACC CGCTTGCTGC 
CCTGGACGAC GCAATCAGCT CCATCGACAA ATTCCGTTCT TCCCTGGGTG CTATCCAGAA 
CCX5TCTGGAT TCCGCGGTCA CCAACCTGAA CAACACCACT ACCAACCTGT CCGAAGCGCA 
GTCCCGTATT CAGGACGCCG ACTATGCX3AC CGAAGTGTCC AACATGTCGA AAGCGCAGAT 
CATCCAGCAG GCCGGTAACT CCGTGCTGGC AAAAGCTAAC CAGGTACCGC AGCAGGTTCT 
GTCTCTGCTG CAGGGTTAA 
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ATGGCACAAG TCATTAATAC CAACAGCCTC TCGCTGATCA CTCAAAATAA TATCAACAAG 
AACCAGTCTG CGCTGTCGAG TTCTATCGAG CGTCTGTCTT CTGGCTTGCG TATTAACAGC 
GCGAAGGATG ACGCCGCGGG TCAGGCGATT GCTAACCGTT TTACTTCTAA CATTAAAGGC 
CTGACTCAGG CTGCACGTAA CGCCAACGAC GGTATTTCTG TTGCACAGAC CACTGAAGGC 
GCGCTGTCCG AAATCAACAA CAACTTACAG CGTATCCGTG AGCTGACGGT TCAGGCTTCT 
ACCGGGACTA ACTCTGATTC GGATCTGGAC TCCATTCAGG ACGAAATCAA ATCCCGTCTC 
GACGAAATTG ACCGCGTATC CGGTCAGACC CAGTTCAACG GCGTGAACGT ACTGGCAAAA 
GACGGTTCGA TGAAAATTCA GGTTGGTGCG AATGACGGTG AAACTATCAC TATCGACCTG 
AAGAAAATCG ATTCTGATAC TCTGGGTCTG AATGGTTTTA ACGTAAAT6G TAAAGGTACT 
ATTACCAACA AAGCTGCAAC GGTAAGTGAT TTAACTTCTG CTGGCGCGAA GTTAAACAC 
CACGACAGGT CTTTATGATC TGAAAACCGA AAATACCTTG TTAACTACCG ATGCTGCATT 
CGATAAATTA GGGAATGGCG ATAAAGTCAC CGTTGGCGGC GTAGATTATA CTTACAACGC 
TAAATCTGGT GATTTTACTA CCACCAAATC TACTGCTGGT ACGGGTGTAG ACGCCGCGGG 
GCAGGCTACT 6ATTCAGCTA AAAAACGTGA TGCGTTAGCT GCCACCCTTC ATGCTGATGT 
GGGTAAATCT GTTAATGGTT CTTACACCAC AAAAGATGGT ACTGTTTCTT TCGAAACGGA 
TTCAGCAGGT AATATCACCA TCGGTGGAAG CCAGGCATAC GTAGACGATG CAGGCAACTT 
GACGACTAAC AACGCTGGTA GCGCAGCTAA AGCTGATATG AAAGCGCTGC TTAAAGCCGC 
GAGCGAAGGT A6TGACGGTG CCTCTCTGAC ATTCAATGGC ACTGAATATA CTATCGCAAA 
AGCAACTCCT GCGACAACCT CTCCAGTAGC TCCGTTAATC CCTGGTGGGA TTACTTATCA 
GGCTACAGTG AGTAAAGATG TAGTATTGAG CGAAACCAAA GCGGCTGCCG CGACATCTTC 
AATTACCTTT AATTCCGGTG TACTGAGCAA AACTATTGGG TTTACCGCGG GTGAATCCAG 
TGATGCTGCG AAGTCTTATG TGGATGATAA AGGTGGTATT ACTAACGTTG CCGACTATAC 
AGTCTCTTAC AGCGTTAACA AGGATAACGG CTCTGTGACT GTTGCCGGGT ATGCTTCAGC 
GACTGATACC AATAAAGATT ATGCTCCAGC AATTGGTACT GCTGTAAATG TGAACTCCGC 
GGGTAAAATC ACTACTGAGA CTACCAGTGC TGGTTCTGCA ACGACCAACC CGCTTGCTGC 
CCTGGACGAC GCTATCAGCT CCATCGACAA ATTCCGTTCT TCCCTGGGTG CTATCCAGAA 
CCGTCTGGAT TCCGCAGTCA CCAACCTGAA CAACACCACT ACCAACCTGT CTGAAGCGCA 
GTCCCGTATT CAGGACGCCG ACTATGCGAC CGAAGTGTCC AACATGTCGA AAGCGCAGAT 
TATCCAGCAG GCCGGTAACT CCGTGCTGGC AAAAGCCAAC CAGGTACCGC AGCAGGTTCT 
GTCTCTGCTG CAGGGTTAA 
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ATGGCACAAGTCATTAATACCAACAGCCTCTCGCTGATCACTCAAAATAATATCAACAAG 

AACCAGTCTGCGCTGTCGAGTTCTATCGAGCGTCTGTCTTCTCGCrrGCGTATTAACAGC 

GCGAAGGATGACGCCGCy^GGTCAGGCGATTGCTAACCGTTTTACTTCTAACATTAAAGGC 

CTGACrCAGGOKSCCCGTAACGCCAACGACGGTATTTCTGTTGCGCAGACCA^ 

GCGCTGTCCGAAATCAACy^CAACTTACJ^GCGTATTCGTGAACTGACGGTTCAGGCCACT 

ACAGGGACTAACTCCGATTCTGACCTGGACTCCATCCAGGACGAAATCyUUVTCTCGTCTT 

GATGAAATTGACCGCGTATCCGGCCAGACCCAGTTCAACGGCGTGAACGTGCTGGCGAAA 

GACGGTTCAATGAAAATTCAGGTTGGTGCGAATGACGGCGAAACCATCACGATCGACCTG 

AAAAAAATCGATTCTGATACTCTGGGTCTGAATGGCTTTAACGTAAATGGTAAAG^ 

ATTACCAACAAAGCTGCAACGGTAAGTGATTTAACTTCTGCTGGCGCGAAGTTAAACAC 

ACGACAGGTCTTTATGATCTGAAAACCGAAAATACCITGTTAACTACCGATGCIXSCAOT 

GATAAATTAGGGAATGGCGATAAAGTCACAGTTGGCGGCGTAGATTATACTTACAACGCT 

AAATCTGGTGATTTTACTACCACTAAATCTACTGCTGGTACGGGTGTAGACGCCGCGGCG 

CAGGCTGCrrGATTa\GCTTCAAAACGTGATGCGTTAGCTGCCACCCTTCATGCrc 

GGTAAATCTGTTAATGGTTCTTACACCACAAAAGATGGTACnXSTTTCTT^ 

TCAGCAGGTAATATOVCaiTCGGTGGAAGCCAGGCATACGTAGACGATGCAGGCAAC^^ 

ACGACTAACAACGCTGGTAGCGCAGCTAAAGCTGATATGAAAGCGCTGCTCa^ 

AGCGAAGGTAGTGACGGTGCCTCTCTGACATTCAATGGCACyVGAATATACCATCGCAA^ 

GCAACTCCTGCGACAACCACTCCAGTAGCTCCGTTAATCCCTGGTGGGATTACTTATC^ 

GCTACAGTGAGTAAAGATGTAGTATTGAGCGAAACCAAAGCGGCTGCCGCGACATCTTCA 

ATTACCTTTAATTCCGGTGTACTGAGCAAAACTATTGGGTTTACCGCGGGTGAATCCAGT 

GATGCTGCGAAGTCTTATGTGGATGATAAAGGTGGTATCACTAACGTTGCCGACTATACA 

GTCTCTTACAGCGTTAACAAGGATAACGGCTCTGTGACTGTTGCCGGGTATGCTTCAGCG 

ACTGATACCmTAAAGATTATGCTCCAGCAATTGGTACTGCTGTAAATGTGAACTCCGCG 

GGTAAAATCACTACTGAGACTACCAGTGCTGGTTCTGCAACGACCAACCCXSCTTGCTGC^ 

CTGGACGACGCAATCAGCTCCATCGAO^TTCCGTTCITCCCTGGGTGCrrATC^^ 

CGTCTGGATTCCGCAGTCTVCCyUVCCTGAACAACACCACTACCAACCTGTCCGAAGCGCAG 

TCCCGTATTCAGGACGCCGACTATGCGACCGAAGTGTCCAACATGTCGAAAGCGCAGATC 

ATTCAGCAGGCCGGTAACTCCGTGCTGGCAAAAGCTAACCTIGGTACCGCAGCAGGTTCTO 

TCTCTGCTGCAGGGTTAA 
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ATGGCACAAG TCATTAATAC CAACAGCCTC TCGCTGATCA CTCAAAATAA TATCAACAAG 
AACCAGTCTG CGCTGTCGAG TTCTATCGAG CGTCTGTCTT CTGGCTTGCG TATTAACAGC 
GCGAAGGATG ACGCCGCAGG TCAGGCGATT GCTAACCGTT TTACTTCTAA CATTAAAGGC 
CTGACTCAGG CTGCACGTAA CGCCAACGAC GGTATTTCTG TTGCGCAGAC CACCGAAGGC 
GCGCTGTCCG AAATCAACAA CAACTTACAG CGTATTCGTG AACTGACGGT TCAGGCCACT 
ACAGGGACTA ACTCCGATTC TGACCTGGAC TCCATCCAGG ACGAAATCAA ATCTCGTCTT 
GATGAAATTG ACCGCGTATC CGGCCAGACC CAGTTCAACG GCGTGAACGT GCTGGCGAAA 
GACGGTTCAA TGAAAATTCA GGTTGGTGCG AATGACGGCG AAACCATCAC GATCGACCTG 
AAAAAAATCG ATTCTGATAC TCTGGGTCTG AATGGCTTTA ACGTAAATGG TAAAGGTACT 
ATTACCAACA AAGCTGCAAC GGTAAGTGAT TTAACTTCTG CTGGCGCGAA GTTAAACAC 
CACGACAGGT CTTTATGATC TGAAAACCGA AAATACCTTG TTAACTACCG ATGCTGCATT 
CGATAAATTA GGGAATGGCG ATAAAGTCAC AGTTGGCGGC GTAGATTATA CTTACAACGC 
TAAATCTGGT GATTTTACTA CCACTAAATC TACTGCTGGT ACGGGTGTAG ACGCCGCGGC 
GCAGGCTGCT GATTCAGCTT CAAAACGTGA TGCGTTAGCT GCCACCCTTC ATGCTGATGT 
GGGTAAATCT GTTAATGGTT CTTACACCAC AAAAGATGGT ACTGTTTCTT TCGAAACGGA 
TTCAGCAGGT AATATCACCA TCGGTGGAAG CCAGGCATAC GTAGACGATG CAGGCAACTT 
GACGACTAAC AACGCTGGTA GCGCAGCTAA AGCTGATATG AAAGCGCTGC TCAAAGCAGC 
GAGCGAAGGT AGTGACGGTG CCTCTCTGAC ATTCAATGGC ACAGAATATA CCATCGCAAA 
AGCAACTCCT GCGACAACCA CTCCAGTAGC TCCGTTAATC CCTGGTGGGA TTACTTATCA 
GGCTACAGTG AGTAAAGATG TAGTATTGAG CGAAACCAAA GCGGCTGCCG CGACATCTTC 
AATTACCTTT AATTCCGGTG TACTGAGCAA AACTATTGGG TTTACCGCGG GTGAATCCAG 
TGATGCTGCG AAGTCTTATG TGGATGATAA AGGTGGTATC ACTAACGTTG CCGACTATAC 
AGTCTCTTAC AGCGTTAACA AGGATAACGG CTCTGTGACT GTTGCCGGGT ATGCTTCAGC 
GACTGATACC AATAAAGATT ATGCTCCAGC AATTGGCACT GCTGTAAATG TGAACTCCGC 
GGGTAAAATC ACTACTGAGA CTACCAGTGC TGGTTCTGCA ACGACCAACC CGCTTGCTGC 
CCTGGACGAC GCAATCAGCT CCATCGACAA ATTCCGTTCT TCCCTGGGTG CTATCCAGAA 
CCX3TCTGGAT TCCGCGGTCA CCAACCTGAA CAACACCACT ACCAACCTGT CCGAAGCGCA 
GTCCCGTATT CAGGACGCCG ACTATGCGAC CGAAGTGTCC AACATGTCGA AAGCGCAGAT 
CATCCAGCAG GCCGGTAACT CCGTGCTGGC AAAAGCTAAC CAGGTACCGC AGCAGGTTCT 
GTCTCTGCTG CAGGGTTAA 
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ATGGCACAAG TCATTAATAC CAACAGCCTC TCGCTGATCA CTCAAAATAA TATCAACAAG 
AACCAGTCTG CGCTGTCGAG TTCTATCGAG CGTCTGTCTT CTGGCTTGCG TATTAACAGC 
GCGAAGGATG ACGCCGCGGG TCAGGCGATT GCTAACCGTT TTACTTCTAA CATTAAAGGC 
CTGACTCAGG CTGCAC6TAA CGCCAACGAC GGTATTTCTG TTGCACAGAC CACTGAAGGC 
GCGCTGTCCG AAATCAACAA CAACTTACAG CGTATCCGTG AGCTGACGGT TCAGGCTTCT 
ACCGGGACTA ACTCTGATTC GGATCTGGAC TCCATTCAGG ACGAAATCAA ATCCCGTCTC 
GACX3AAATTG ACCGCGTATC CGGTCAGACC CAGTTCAACG GCGTGAACGT ACTGGCAAAA 
GACGGTTCGA TGAAAATTCA GGTTGGTGCG AATGACGGTG AAACTATCAC TATCGACCTG 
AAGAAAATCG ATTCTGATAC TCTGGGTCTG AATGGTTTTA ACGTAAATGG TAAAGGTACT 
ATTACCAACA AAGCTGCAAC GGTAAGTGAT TTAACTTCTG CTGGCGCGAA GTTAAACACC 
ACGACAGGT CTTTATGATC TGAAAACCGA AAATACCTTG TTAACTACCG ATGCTGCATT 
CGATAAATTA GGGAATGGCG ATAAAGTCAC CGTTGGCGGC GTAGATTATA CTTACAACGC 
TAAATCTGGT 6ATTTTACTA CCACCAAATC TACTGCTGGT ACGGGTGTAG ACGCCGCGGG 
GCAGGCTACT GATTCAGCTA AAAAACGTGA TGCGTTAGCT GCCACCCTTC ATGCTGATGT 
GGGTAAATCT GTTAATGGTT CTTACACCAC AAAAGATGGT ACTGTTTCTT TCGAAACGGA 
TTCAGCAGGT AATATCACCA TCGGTGGAAG CCAGGCATAC GTAGACGATG CAGGCAACTT 
GACGACTAAC AACGCTGGTA GCGCAGCTAA AGCTGATATG AAAGCGCTGC TTAAA6CCGC ' 
GAGCGAAGGT AGTGACGGTG CCTCTCTGAC ATTCAATGGC ACTGAATATA CTATCGCAAA 
AGCAACTCCT GCGACAACCT CTCCAGTAGC TCCGTTAATC CCTGGTGGGA TTTCTTATCA 
GGCTACAGTG AGTAAAGATG TAGTATTGAG CGAAACCAAA GCGGCTGCCG CGACATCTTC 
AATTACCTTT AATTCCGGTG TACTGAGCAA AACTATTGGG TTTACCGCGG GTGAATCCAG 
TGATGCTGCG AAGTCTTATG TGGATGATAA AGGTGGTATT ACTAACGTTG CCGACTATAC 
AGTCTCTTAC AGCGTTAACA AGGATAACGG CTCTGTGACT GTTGCCGGGT ATGCTTCAGC 
GACTGATACC AATAAAGATT ATGCTCCAGC AATTGGTACT GCTGTAAATG TGAACTCCGC 
GGGTAAAATC ACTACTGAGA CTACCAGTGC TGGTTCTGCA ACGACCAACC CGCTTGCTGC 
CCTGGACGAC GCTATCAGCT CCATCGACAA ATTCCGTTCT TCCCTGGGTG CTATCCAGAA 
CCGTCTGGAT TCCGCAGTCA CCAACCTGAA CAACACCACT ACCAACCTGT CTGAAGCGCA 
GTCCCGTATT CAGGACGCCG ACTATGCGAC CGAAGTGTCC AACATGTCGA AAGCGCAGAT 
TATCCAGCAG GCCGGTAACT CCGTGCTGGC AAAAGCCAAC CAGGTACCGC AGCAGGTTCT 
GTCTCTGCTG CAGGGTTAA 
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ATGGCACAAG TCATTAATAC CAACAGCCTC TCGCTGATCA CTCAAAATAA TATCAACAAG 
AACCAGTCTG CGCTGTC6AG TTCTATCGAG CGTCTGTCTT CTGGCTTGCG TATTAACAGC 
GCGAAGGATG ACGCCGCAGG TCAGGCGATT GCTAACCGTT TTACTTCTAA CATTAAAGGC 
CTGACTCAGG CGGCCCGTAA CGCCAACGAC GGTATTTCTG TTGCGCAGAC CACCGAAGGC 
GCGCTGTCCG AAATCAACAA CAACTTACAG CGTATTCGTG AACTGACGGT TCAGGCCACT 
ACAGGGACTA ACTCCGATTC TGACCTGGAC TCCATCCAGG ACGAAATCAA ATCTCGTCTT 
GATGAAATTG ACCGCGTATC CGGCCAGACC CAGTTCAACG GCGTGAACGT GCTGGCGAAA 
GACGGTTCAA TGAAAATTCA GGTTGGTGCG AATGACGGCG AAACCATCAC GATCGACCTG 
AAAAAAATCG ATTCTGATAC TCTGGGTCTG AATGGCTTTA ACGTAAATGG TAAAGGTACT 
ATTACCAACA AA6CTGCAAC GGTAAGTGAT TTAACTTCTG CTGGCGCGAA GTTAAACAC 
CACGACAGGT CTTTATGATC TGAAAACCGA AAATACCTTG TTAACTACCG ATGCTGCATT 
CGATAAATTA GGGAATGGCG ATAAAGTCAC AGTTGGCGGC GTAGATTATA CTTACAACGC 
TAAATCTGGT GATTTTACTA CCACTAAATC TACTGCTGGT AC6GGTGTAG ACGCCGCGGC 
GCAGGCTGCT GATTCAGCTT CAAAACGTGA TGCGTTAGCT GCCACCCTTC ATGCTGATGT 
GGGTAAATCT GTTAATGGTT CTTACACCAC AAAAGATGGT ACTGTTTCTT TCGAAACGGA 
TTCAGCAGGT AATATCACCA TCGGTGGAAG CCAGGCATAC GTAGACGATG CAGGCAACTT 
GACGACTAAC AACGCTGGTA GCGCAGCTAA A6CTGATATG AAAGCGCTGC TCAAAGCAGC 
GAGCGAAGGT A6TGACGGTG CCTCTCTGAC ATTCAATGGC ACAGAATATA CCATCGCT^ 
AGCAACTCCT GCGACAACCA CTCCAGTAGC TCCGTTAATC CCTGGTGGGA TTACTTATCA 
GGCTACAGTG AGTAAAGATG TAGTATTGAG CGAAACCAAA GCGGCTGCCG CGACATCTTC 
AATTACCTTT AATTCCGGTG TACTGAGCAA AACTATTGGG TTTACCGCGG GTGAATCCAG 
TGATGCTGCG AAGTCTTATG TGGATGATAA AGGTGGTATC ACTAACGTTG CCGACTATAC 
AGTCTCTTAC AGCGTTAACA AGGATAACGG CTCTGTGACT GTTGCCGGGT ATGCTTCAGC 
GACTGATACC AATAAAGATT ATGCTCCAGC AATTGGTACT GCTGTAAATG TGAACTCCGC 
GGGTAAAATC ACTACTGAGA CTACCAGTGC TGGTTCTGCA ACGACCAACC CGCTTGCTGC 
CCTGGACGAC GCAATCAGCT CCATCGACAA ATTCCGTTCT TCCCTGGGTG CTATCCAGAA 
CCGTCTGGAT TCCGCAGTCA CCAACCTGAA CAACACCACT ACCAACCTGT CCGAAGCGCA 
GTCCCGTATT CAGGACGCCG ACTATGCGAC CGAAGTGTCC AACATGTCGA AAGCGCAGAT 
CATTCAGCAG GCCGGTAACT CCGTGCTGGC AAAAGCTAAC CAGGTACCGC AGCAGGTTCT 
GTCTCTGCTG CAGGGTTAA 
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ATGGCACAAG TCATTAATAC CAACAGCCTC TCGCTGATCA CTCAAAATAA TATCAACAAG 
AACCAGTCTG CGCTGTCGAG TTCTATCGAG CGTCTGTCTT CTGGCTTGCG TATTAACAGC 
GCGAAGGATG ACGCCGCGGG TCAGGCGATT GCTAACCGTT TTACTTCTAA CATTAAAGGC 
CTGACTCAGG CTGCACGTAA CGCCAACGAC GGTATTTCTG TTGCACAGAC CACCGAAGGC 
GCGCTGTCTG AAATCAACAA CAACTTACAG CGTATCCGTG AGCTGACGGT TCAGGCTTCT 
ACCGGAACTA ACTCTGATTC GGATCTGGAC TCCATTCAGG ACGAAATCAA ATCCCGTCTT 
GATGAAATTG ACCGCGTATC CGGCCAGACC CAGTTCAACG GCGTGAACGT ACTGGCAAAA 
GACGGTTCGA TGAAAATTCA GGTTGGTGCG AATGACGGTG AAACTATCAC TATCGACCTG 
AAGAAAATCG ATTCTGATAC TCTGGGTCTG AATGGTTTTA ACGTAAATGG TAAAGGTACT 
ATTACCAACA AAGCTGCAAC GGTAAGTGAT TTAACTTCTG CTGGCGCGAA GTTAAACAC 
CACGACAGGT CTTTATGATC TGAAAACCGA AAATACCTTG TTAACTACCG ATGCTQCATT 
CGATAAATTA GGGAATGGCG ATAAAGTCAC CGTTGGCGGC GTAGATTATA CTTACAACGC 
TAAATCTGGT GATTTTACTA CCACCAAATC TACTGCTGGT ACGGGTGTAG ACGCCGCGGG 
GCAGGCTACT GATTCAGCTA AAAAACGTGA TGCGTTAGCT GCCACCCTTC ATGCTGATGT 
GGGTAAATCT GTTAATGGTT CTTACACCAC AAAAGATGGT ACTGTTTCTT TCGAAACGGA 
TTCAGCAGGT AATATCACCA TCGGTGGAAG CCAGGCAJAC GTAGACGATG CAGGCAACTT 
GACGACTAAC AACGCTGGTA GCGCAGCTAA AGCTGATATG AAAGCGCTGC TTAAAGCCGC 
GAGCGAAGGT AGTGACGGTG CTTCTCTGAC ATTCAATGGC ACTGAATATA CTATCGCAAA 
AGCAACTCCT GCGACAACCT CTCCAGTAGC TCCGTTAATC CCTGGTGGGA TTACTTATCA 
GGCTACAGTG AGTAAAGATG TAGTATTGAG CGAAACCAAA GCGGCTGCCG CGACATCTTC 
AATTACCTTT AATTCCGGTG TACTGAGCAA AACTATTGGG TTTACCGCGG GTGAATCCAG 
TGATGCTGCG AAGTCTTATG TGGATGATAA AGGTGGTATT ACTAACGTTG CCGACTATAC 
AGTCTCTTAC AGCGTTAACA AGGATAACGG CTCTGTGACT GTTGCCGGGT ATGCTTCAGC 
GACTGATACC AATAAAGATT ATGCTCCAGC AATTGGTACT GCTGTAAATG TGAACTCCGC 
GGGTAAAATC ACTACTGAGA CTACCAGTGC TGGTTCTGCA ACGACCAACC CGCTTGCTGC 
CCTGGACGAC GCTATCAGCT CCATCGACAA ATTCCGTTCT TCCCTGGGTG CTATCCAGAA 
CCGTCTGGAT TCCGCAGTCA CCAACCTGAA CAACACCACT ACCAACCTGT CTGAAGCGCA 
GTCCCGTATT CAGGACGCCG ACTATGCGAC CGAAGTGTCC AACATGTCGA AAGCGCAGAT 
TATCCAGCAG GCCGGTAACT CCGTGCTGGC AAAAGCCAAC CAGGTACCGC AGCAGGTTCT 
GTCTCTGCTG CAGGGTTAA 
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ATGGCACAAG TCATTAATAC CAACAGCCTC TCGCTGATCA CTCAAAATAA TATCAACAAG 
AACCAGTCTG CGCTGTCGAG TTCTATCGAG CGTCTGTCTT CTGGCTTGCG TATTAACAGC 
GCGAAGGATG ACGCCGCGGG TCAGGCGATT GCTAACCGTT TTACTTCTAA CATTAAAGGC 
CTGACTCAGG CTGCACGTAA CGCCAACGAC GGTATTTCTG TTGCACAGAC CACTGAAGGC 
GCGCTGTCCG AAATCAACAA CAACTTACAG CGTATCCGTG AGCTGACGGT TCAGGCTTCT 
ACCGGGACTA ACTCTGATTC GGATCTGGAC TCCATTCAGG ACGAAATCAA ATCCCGTCTC 
GACGAAATTG ACCGCGTATC CGGTCAGACC CAGTTCAACG GCGTGAACGT ACTGGCAAAA 
GACGGTTCGA TGAAAATTCA GGTTGGTGCG AATGACGGTG AAACTATCAC TATCGACCTG 
AAGAAAATCG ATTCTGATAC TCTGGGTCTG AATGGTTTTA ACGTAAATGG TAAAGGTACT 
ATTACCAACA AAGCTGCAAC GGTAAGTGAT TTAACTTCTG CTGGCGCGAA GTTAAACAC 
CACGACAGGT CTTTATGATC TG7UVAACCGA AAATACCTTG TTAACTACCG ATGCTGCATT 
C6ATAAATTA GGGAATGGCG ATAAAGTCAC CGTTGGCGGC GTAGATTATA CTTACAACGC 
TAAATCT6GT GATTTTACTA CCACCAAATC TACTGCTGGT ACGGGTGTAG ACGCCGCGGG 
GCAGGCTACT GATTCAGCTA AAAAACGTGA TGCGTTAGCT GCCACCCTTC ATGCTGATGT 
GGGTAAATCT GTTAATGGTT CTTACACCAC AAAAGATGGT ACTGTTTCTT TCGAAACGGA 
TTCAGCAGGT AATATCACCA TCGGTGGAAG CCAGGCATAC GTAGACGATG CAGGCAACTT 
GACGACTAAC AACGCTGGTA GCGCAGCTAA AGCT6ATATG AAAGCGCTGC TTAAAGCCGC 
GAGCGAAGGT AGTGACGGTG CCTCTCTGAC ATTCAATGGC ACTGAATATA CTATCGCAAA 
AGCAACTCCT GCGACAACCT CTCCAGTAGC TCCGTTAATC CCTGGTGGGA TTTCTTATCA 
GGCTACAGTG AGTAAAGATG TAGTATTGAG CGAAACCAAA GCGGCTGCCG CGACATCTTC 
AATTACCTTT AATTCCGGTG TACTGAGCAA AACTATTGGG TTTACCGCGG GTGAATCCAG 
TGATGCTGCG AAGTCTTATG TGGATGATAA AGGTGGTATT ACTAACGTTG CCGACTATAC 
AGTCTCTTAC AGCGTTAACA AGGATAACGG CTCTGTGACT GTTGCCGGGT ATGCTTCAGC 
GACTGATACC AATAAAGATT ATGCTCCAGC AATTGGTACT GCTGTAAATG TGAACTCCGC 
GGGTAAAATC ACTACTGAGA CTACCAGTGC TGGTTCTGCA ACGACCAACC CGCTTGCTGC 
CCTGGACGAC GCTATCAGCT CCATCGACAA ATTCCGTTCT TCCCTGGGTG CTATCCAGAA 
CCGTCTGGAT TCCGCAGTCA CCAACCTGAA CAACACCACT ACCAACCTGT CTGAAGCGCA 
GTCCCGTATT CAGGACGCCG ACTATGCGAC CGAAGTGTCC AACATGTCGA AAGCGCAGAT 
TATCCAGCAG GCCGGTAACT CCGTGCTGGC AAAAGCCAAC CAGGTACCGC AGCAGGTTCT 
GTCTCTGCTG CAGGGTTAA 
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